TR2017131
Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge

 "Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge", IEEE HPEC Graph Challenge, DOI: 10.1109/HPEC.2017.8091045, September 2017, pp. 16.BibTeX TR2017131 PDF
 @inproceedings{Zhuzhunashvili2017sep,
 author = {Zhuzhunashvili, David and Knyazev, Andrew},
 title = {Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge},
 booktitle = {IEEE HPEC Graph Challenge},
 year = 2017,
 pages = {16},
 month = sep,
 doi = {10.1109/HPEC.2017.8091045},
 url = {https://www.merl.com/publications/TR2017131}
 }
,
 "Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge", IEEE HPEC Graph Challenge, DOI: 10.1109/HPEC.2017.8091045, September 2017, pp. 16.

Research Area:
Abstract:
Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is demonstrated to efficiently solve eigenvalue problems for graph Laplacians that appear in spectral clustering. For static graph partitioning, 1020 iterations of LOBPCG without preconditioning result in 10x error reduction, enough to achieve 100% correctness for all Challenge datasets with known truth partitions, e.g., for graphs with 5K/.1M (50K/1M Vertices/Edges in 2 (7) seconds, compared to over 5,000 (30,000) seconds needed by the baseline Python code. Our Python code 100% correctly determines 98 (160) clusters from the Challenge static graphs with 0.5M (2M) vertices in 270 (1,700) seconds using 10GB (50GB) of memory. Our singleprecision MATLAB code calculates the same clusters at half time and memory. For streaming graph partitioning, LOBPCG is initiated with approximate eigenvectors of the graph Laplacian already computed for the previous graph, in many cases reducing 23 times the number of required LOBPCG iterations, compared to the static case. Our spectral clustering is generic, i.e. assuming nothing specific of the block model or streaming, used to generate the graphs for the Challenge, in contrast to the base code. Nevertheless, in 10stage streaming comparison with the base code for the 5K graph, the quality of our clusters is similar or better starting at stage 4 (7) for emerging edging (snowballing) streaming, while the computing time is 1001000 smaller.
Related News & Events

NEWS Andrew Knyazev (MERL) presents at the SchlumbergerTufts U. Computational and Applied Math Seminar Date: April 10, 2018
Research Area: Machine LearningBrief Andrew Knyazev, Distinguished Research Scientist of MERL, has accepted an invitation to speak about his work on Big Data and spectral graph partitioning at the SchlumbergerTufts U. Computational and Applied Math Seminar. A primary focus of this seminar series is on mathematical and computational aspects of remote sensing. A partial list of the topics of interest includes: numerical solution of large scale PDEs (a.k.a. forward problems); theory and numerical methods of inverse and illposed problems; imaging; related problems in numerical linear algebra, approximation theory, optimization and model reduction. The seminar meets on average once a month, the location alternates between Schlumberger's office in Cambridge, MA and the Tufts Medford Campus.
Abstract: Data clustering via spectral graph partitioning requires constructing the graph Laplacian and solving the corresponding eigenvalue problem. We consider and motivate using negative edge weights in the graph Laplacian. Preconditioned iterative solvers for the Laplacian eigenvalue problem are discussed and preliminary numerical results are presented.
 Andrew Knyazev, Distinguished Research Scientist of MERL, has accepted an invitation to speak about his work on Big Data and spectral graph partitioning at the SchlumbergerTufts U. Computational and Applied Math Seminar. A primary focus of this seminar series is on mathematical and computational aspects of remote sensing. A partial list of the topics of interest includes: numerical solution of large scale PDEs (a.k.a. forward problems); theory and numerical methods of inverse and illposed problems; imaging; related problems in numerical linear algebra, approximation theory, optimization and model reduction. The seminar meets on average once a month, the location alternates between Schlumberger's office in Cambridge, MA and the Tufts Medford Campus.