Dear all,
I'm trying to find an efficient way to build a kNN graph for a large
dataset. Precisely, I have a large set of high dimensional vector (say d
>>> 10000) and I want to build a graph where those high dimensional points
are the vertices and each one is linked to the knearest neighbor based on
some kind similarity defined on the vertex spaces.
My problem is to implement an efficient algorithm to compute the weight
matrix of the graph. I need to compute a N*N similarities and the only way
I know is to use "cartesian" operation follow by "map" operation on RDD.
But, this is very slow when the N is large. Is there a more cleaver way to
do this for an arbitrary similarity function ?
Cheers,
Jao
