Divisi: Learning from Semantic Networks and Sparse SVD

Rob Speer; Kenneth Arnold; Catherine Havasi

doi:10.25080/Majora-92bf1922-002

Divisi: Learning from Semantic Networks and Sparse SVD

Rob Speer
MIT Media Lab

Kenneth Arnold
MIT

Catherine Havasi
MIT

Abstract

Singular value decomposition (SVD) is a powerful technique for finding similarities and patterns in large data sets. SVD has applications in text analysis, bioinformatics, and recommender systems, and in particular was used in many of the top entries to the Netflix Challenge. It can also help generalize and learn from knowledge represented in a sparse semantic network.

Although this operation is fundamental to many fields, it requires a significant investment of effort to compute an SVD from sparse data using Python tools. Divisi is an answer to this: it combines NumPy, PySparse, and an extension module wrapping SVDLIBC, to make Lanczos' algorithm for sparse SVD easily usable within cross-platform Python code.

Divisi includes utilities for working with data in a variety of sparse formats, including semantic networks represented as edge lists or NetworkX graphs. It augments its matrices with labels, allowing you to keep track of the meaning of your data as it passes through the SVD, and it can export the labeled data in a format suitable for separate visualization GUIs.

Keywords

SVD, sparse, linear algebra, semantic networks, graph theory

DOI

10.25080/Majora-92bf1922-002

Bibtex entry

Full text PDF

Proceedings

Divisi: Learning from Semantic Networks and Sparse SVD