TR2017-131

Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge


    •  Zhuzhunashvili, D., Knyazev, A., "Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge", IEEE HPEC Graph Challenge, DOI: 10.1109/​HPEC.2017.8091045, September 2017, pp. 1-6.
      BibTeX TR2017-131 PDF
      • @inproceedings{Zhuzhunashvili2017sep,
      • author = {Zhuzhunashvili, David and Knyazev, Andrew},
      • title = {Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge},
      • booktitle = {IEEE HPEC Graph Challenge},
      • year = 2017,
      • pages = {1--6},
      • month = sep,
      • doi = {10.1109/HPEC.2017.8091045},
      • url = {https://www.merl.com/publications/TR2017-131}
      • }
  • Research Area:

    Signal Processing

Abstract:

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is demonstrated to efficiently solve eigenvalue problems for graph Laplacians that appear in spectral clustering. For static graph partitioning, 10-20 iterations of LOBPCG without preconditioning result in -10x error reduction, enough to achieve 100% correctness for all Challenge datasets with known truth partitions, e.g., for graphs with 5K/.1M (50K/1M Vertices/Edges in 2 (7) seconds, compared to over 5,000 (30,000) seconds needed by the baseline Python code. Our Python code 100% correctly determines 98 (160) clusters from the Challenge static graphs with 0.5M (2M) vertices in 270 (1,700) seconds using 10GB (50GB) of memory. Our single-precision MATLAB code calculates the same clusters at half time and memory. For streaming graph partitioning, LOBPCG is initiated with approximate eigenvectors of the graph Laplacian already computed for the previous graph, in many cases reducing 2-3 times the number of required LOBPCG iterations, compared to the static case. Our spectral clustering is generic, i.e. assuming nothing specific of the block model or streaming, used to generate the graphs for the Challenge, in contrast to the base code. Nevertheless, in 10-stage streaming comparison with the base code for the 5K graph, the quality of our clusters is similar or better starting at stage 4 (7) for emerging edging (snowballing) streaming, while the computing time is 100-1000 smaller.

 

  • Related News & Events

    •  NEWS   Andrew Knyazev (MERL) presents at the Schlumberger-Tufts U. Computational and Applied Math Seminar
      Date: April 10, 2018
      Research Area: Machine Learning
      Brief
      • Andrew Knyazev, Distinguished Research Scientist of MERL, has accepted an invitation to speak about his work on Big Data and spectral graph partitioning at the Schlumberger-Tufts U. Computational and Applied Math Seminar. A primary focus of this seminar series is on mathematical and computational aspects of remote sensing. A partial list of the topics of interest includes: numerical solution of large scale PDEs (a.k.a. forward problems); theory and numerical methods of inverse and ill-posed problems; imaging; related problems in numerical linear algebra, approximation theory, optimization and model reduction. The seminar meets on average once a month, the location alternates between Schlumberger's office in Cambridge, MA and the Tufts Medford Campus.

        Abstract: Data clustering via spectral graph partitioning requires constructing the graph Laplacian and solving the corresponding eigenvalue problem. We consider and motivate using negative edge weights in the graph Laplacian. Preconditioned iterative solvers for the Laplacian eigenvalue problem are discussed and preliminary numerical results are presented.
    •