Exploration and analysis of complex data using Topological Data Analysis
Gunnar Carlsson, Ph.D., Professor of Mathematics, Stanford University, Stanford, CA
Pek Yee Lum, Ph.D., VP of Product and Solutions, Ayasdi Inc., Palo Alto, CA
The life sciences domains are producing data in larger quantities and with more complexity than ever before. The task of analyzing the data to obtain knowledge is one of the fundamental challenges of our time. Model and query based algorithms are useful in simple systems where complexity is limited but real-world data are often non-linear with a high degree of complexity. We make a case here for a new and radical approach to tackle today’s complex data problems. Our approach is called Topological Data Analysis, or TDA. It uses higher dimensional version of distance functions to understand the data through its shape, suitably defined. This leads us to the branch of mathematics called topology. Until recently, this branch of mathematics has only been used to study abstract shapes and surfaces. However, over the last 15 years, there has been a concerted effort to utilize the ability of topological methods to extract patterns to study large and high dimensional data sets. These methods, taken together, are referred to as Topological Data Analysis, or TDA. To date, we have applied TDA to data collected from high throughput compound screenings, flow cytometry, mass spectrometry, various genomics and proteomics platforms, personal health devices, electronic medical records, text and many other diverse sources. We are able to identify sub-populations among cancer patients for better biomarker selection, uncover insights into disease biology and pinpoint outliers and anomalies. We believe that TDA is the new approach needed to tackle data that are both large and complex. We have also built software that allows researchers to access these topological algorithms in real time through an interactive user interface.