Machine learning for actionable, interpretable marker selection in -omics studies
Bianca Dumitrascu, University of Cambridge
Extracting interpretable, lower dimensional representation from data is not a new problem. Over the years, numerous algorithms and models have been developed across a range of disciplines such as statistics, computer science and operation research. In computational biology, under the auspice of single cell technology development, one challenging task is to select a small set of informative markers to identify and differentiate specific cellular information (e.g cell type, cell state or cell location) as precisely as possible. In this talk, I will discuss scGene-Fit, a method for selecting gene transcript markers that jointly optimize cell label recovery using a simple label-aware compressive classification approach. Beyond presenting its features and limitations, I will also discuss on-going work aimed at improving them. Finally, I will review recent literature on the topic and related interpretable machine learning approaches that need further understanding and exploration, but which hold promise in the genomic context.