Evgenia (Eugenia) - Maria Kontopoulou

ekontopo@alumni.purdue.edu

Software


TeraPCA


A library that computes the top Principal Components (PCs) of tera-scale matrices using Randomized Singular Value Decomposition (RandSVD). Our implementation is based on multithreaded libraries such as LAPACKE, BLAS and MKL, and it can handle datasets which might exceed the amount of available system memory by performing out-of-core computations. [Download]

Large Genetic Data Generator


A software that enables fast generation of simulated large scale genetic data using sophisticated random distributions to simulate the genetic patterns. Implemented on C++ leveraging OPENMP for fast parallel computations. [Download]

Approximation of the VonNeumann Entropy of a Density Matrix


A collection of codes to approximate the Von Neumann entropy of density matrices. Implementation of methods appeared in the paper: "Randomized Linear Algebra Approaches to Estimate the von Neumann Entropy of Density Matrices " [Download]

Approximation of the LogDeterminant of an SPD Matrix


The software from the paper: "A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix".[Download]

Text-to-Matrix Generator (TMG)


A Matlab Toolbox that can be used for various tasks in text mining (TM) specifically:

  1. Indexing
  2. Retrieval
  3. Dimensionality Reduction
  4. No-Negative Matrix Factorizations
  5. Clustering
  6. Classification

Most of TMG is written in MATLAB, though a large segment of the indexing phase is written in Perl.

TMG is especially suited for TM applications where data is high-dimensional but extremely sparse as it uses the sparse matrix infrastructure of MATLAB. Initially built, by Dr. Dimitrios Zeimpekis, as a preprocessing tool for creating term-document matrices (tdm's) from unstructured text that wasreportedly used with success by several researchers and instructors, the new version of TMG (December 2011) offers a much wider range of tools.

My work on TMG focused on the creation of a brandly new tool, by incorporating into TMG methods for dimensionality reduction, under the name Structured Dimensionality Reduction Techniques. These techniques, varying from fully deterministic to partially randomized, tend to result in factors that are scaled copies of the actual rows and/or columns of the initial matrix. Another, interesting feature that the new version of TMG will include is partially randomized techniques that boost BLAS type operations like Matrix-Vector Multiplication.

For more infomation visit the official page of TMG.