This was part of
Eliciting Structure in Genomics Data
MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data
Soledad Villar, Johns Hopkins University
Monday, August 30, 2021
Abstract: Comparison and alignment of large, high dimensional data sets is a challenging problem. A particular difficulty is that standard algorithms do not scale. An efficient solution that was feasible for such data sets would impact many different knowledge domains. We introduce and study MREC, a recursive decomposition procedure for computing matchings between very large data sets. The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matchings themselves can be done using black-box matching procedures that are too expensive to run on the entire data set. Given an absolute measure of the quality of a matching, the framework supports optimization over parameters including partitioning procedures and matching algorithms. By design, MREC is applicable to extremely large data sets. We analyze the procedure and describe the conditions under which it is expected to work well. We demonstrate the flexibility and power of MREC by applying it to a number of alignment problems arising in the analysis of single-cell molecular data.