High-dimensional high-order/tensor data refers to data organized in the form of large-scale arrays spanning three or more dimensions, which becomes increasingly prevalent across various fields, including biology, medicine, psychology, education, and machine learning. Compared to low-dimensional or low-order data, the distinct characteristics of high-dimensional high-order data poses unprecedented challenges to the statistics community. For the most part, classical methods and theory tailored to matrix data may no longer apply to high-order data. While previous studies have attempted to address this issue by transforming high-order data into matrices or vectors through vectorization or matricization, this paradigm often leads to loss of intrinsic tensor structures, and as a result, suboptimal outcomes in subsequent analyses. Another major challenge stems from the computational side, as the high-dimensional high-order structure introduces severe computational difficulties previously unseen in the matrix counterpart. Many fundamental concepts and methods developed for matrix data cannot be extended to high-order data in a tractable manner; for instance, naive extensions of concepts such as operator norm, singular values, and eigenvalues all become NP-hard to compute. With these challenges in mind, there is an urgent need to develop new statistical methods and theory specifically tailored to handle high-dimensional high-order data.
This workshop provides an interdisciplinary platform for collaboration, facilitating the exchange of advanced research developments and topics in statistical and computational methods for analyzing tensor data. By bringing together statisticians, mathematicians, computer scientists, psychometricians, and machine learning researchers, the program aims to foster development of new interdisciplinary areas at the intersection of statistics, mathematics, psychometrics, and engineering. The workshop aims to contribute to both educational and research endeavors in these emerging fields.
Funding
Priority funding consideration will be given to those to register by March 4, 2025. Funding is limited.
Lightning Talks and Poster Session
This workshop will include lightning talks and a poster session for early career researchers (including graduate students). If accepted, you will be asked to do both. In order to propose a lightning session talk and a poster, you must first register for the workshop, and then submit a proposal using the form that will become available on this page after you register. The registration form should not be used to propose a lightning session talk or poster.
The deadline for proposing has been extended toMarch 20, 2025. If your proposal is accepted, you should plan to attend the event in-person.
Soumendra Lahiri
Washington University at Saint Louis
L
L
Lexin Li
University of California, Berkeley
H
M
Himel Mallick
Cornell University
S
M
Song Mei
University of California, Berkeley
A
M
Andrea Montanari
Stanford University
M
P
Marianna Pensky
Central University of Florida
C
P
Carey Priebe
Johns Hopkins University
A
Q
Annie Qu
University of California, Irvine
G
R
Galen Reeves
Duke University
A
S
Aaron Schein
University of Chicago
P
S
Pixu Shi
Duke University
W
W
S
Will Wei Sun
Purdue University
M
Y
Ming Yuan
Columbia University
C
Z
Cun-Hui Zhang
Rutgers University
E
Z
Emma Zhang
Emory University
J
Z
Ji Zhu
University of Michigan
Schedule
Monday, May 5, 2025
8:30-9:00 CDT
Check-In and Breakfast
9:00-9:30 CDT
Euclidean Mirrors
Speaker: Carey Priebe (Johns Hopkins University)
9:30-9:40 CDT
Q&A
9:40-9:45 CDT
Tech Break
9:45-10:15 CDT
Spectral Ranking Inferences Based on General Multiway Comparisons
Speaker: Jianqing Fan (Princeton University)
This paper studies the performance of the spectral method in the estimation and uncertainty quantification of the unobserved preference scores of compared entities in a general and more realistic setup. Specifically, the comparison graph consists of hyper-edges of possible heterogeneous sizes, and the number of comparisons can be as low as one for a given hyper-edge. Such a setting is pervasive in real applications, circumventing the need to specify the graph randomness and the restrictive homogeneous sampling assumption imposed in the commonly used Bradley-Terry-Luce (BTL) or Plackett-Luce (PL) models. Furthermore, in scenarios where the BTL or PL models are appropriate, we unravel the relationship between the spectral estimator and the Maximum Likelihood Estimator (MLE). We discover that a two-step spectral method, where we apply the optimal weighting estimated from the equal weighting vanilla spectral method, can achieve the same asymptotic efficiency as the MLE. Given the asymptotic distributions of the estimated preference scores, we also introduce a comprehensive framework to carry out both one-sample and two-sample ranking inferences, applicable to both fixed and random graph settings. It is noteworthy that this is the first time effective two-sample rank testing methods have been proposed. Finally, we substantiate our findings via comprehensive numerical simulations and subsequently apply our developed methodologies to perform statistical inferences for statistical journals and movie rankings.
10:15-10:25 CDT
Q&A
10:25-10:55 CDT
Coffee Break
10:55-11:25 CDT
TBA
Speaker: Tammy Kolda (MathSci.ai)
11:25-11:35 CDT
Q&A
11:35-11:40 CDT
Tech Break
11:40-12:10 CDT
Dynamic Tensor Factor Model with Main and Interaction Effects
Speaker: Rong Chen (Rutgers University)
High dimensional tensor time series has been encountered increasingly often in applications. Factor model in a form similar to tensor Tucker decomposition has been shown to be a useful model for tensor time series. In this paper we propose a more detailed decomposition so the factors can be interpreted as global effects, main effects of individual modes (columns, rows, etc), and interaction effects among the modes. This decomposition enhances interpretability, effective dimension reduction and estimation efficiency. Theoretical investigation establishes the properties of the estimation procedure. Empirical examples are used to illustrate the applicability of the methodology, highlighting its relevance to contemporary data science challenges in high-dimensional settings.
12:10-12:20 CDT
Q&A
12:20-13:20 CDT
Lunch Break
13:20-13:50 CDT
TBA
Speaker: Jiashun Jin (Carnegie-Mellon University)
13:50-14:00 CDT
Q&A
14:00-14:05 CDT
Tech Break
14:05-14:35 CDT
TBA
Speaker: Andrea Montanari (Stanford University)
14:35-14:45 CDT
Q&A
14:45-15:15 CDT
Coffee Break
15:15-16:15 CDT
Group Activitiy
Tuesday, May 6, 2025
8:30-9:00 CDT
Check-In and Breakfast
9:00-9:30 CDT
TBA
Speaker: Cong Ma (University of Chicago)
9:30-9:40 CDT
Q&A
9:40-9:45 CDT
Tech Break
9:45-10:15 CDT
Simultaneous Decorrelation of Matrix Time Series
Speaker: Cun-Hui Zhang (Rutgers University)
We propose a contemporaneous bilinear transformation for a $ptimes q$ matrix time series to alleviate the difficulties in modeling and forecasting matrix time series when $p$ and/or $q$ are large. The resulting transformed matrix assumes a block structure consisting of several small matrices, and those small matrix series are uncorrelated across all times. Hence an overall parsimonious model is achieved by modelling each of those small matrix series separately without the loss of information on the linear dynamics. Such a parsimonious model often has better forecasting performance, even when the underlying true dynamics deviates from the assumed uncorrelated block structure after transformation. The uniform convergence rates of the estimated transformation are derived, which vindicate an important virtue of the proposed bilinear transformation, i.e. it is technically equivalent to the decorrelation of a vector time series of dimension max$(p,q)$ instead of $ptimes q$. The proposed method is illustrated numerically via both simulated and real data examples. This is joint work with Yuefeng Han, Rong Chen and Qiwei Yao.
10:15-10:25 CDT
Q&A
10:25-10:55 CDT
Coffee Break
10:55-11:25 CDT
TBA
Speaker: Annie Qu (University of California, Irvine)
11:25-11:35 CDT
Q&A
11:35-11:40 CDT
Tech Break
11:40-12:10 CDT
TBA
Speaker: Elena Erosheva (University of Washington)
12:10-12:20 CDT
Q&A
12:20-13:20 CDT
Lunch Break
13:20-13:50 CDT
A Statistically Provable Approach to Integrating LLMs into Topic Modeling
Speaker: Tracy Ke (Harvard University)
"The rise of large language models (LLMs) raises an important question: how can statisticians leverage their expertise in the AI era? Statisticians excel in developing resource-efficient, theoretically grounded models. In this talk, we use topic modeling as an example to illustrate how such expertise can enhance the processing of LLM-generated data.
Traditional topic modeling is applied to word counts without considering contextual meaning. LLMs, however, produce contextualized word embeddings that capture deeper semantic relationships. We leverage these embeddings to refine topic modeling by representing each document as a sequence of word embeddings, modeled as a Poisson point process. Its intensity measure is expressed as a convex combination of K base measures, each representing a topic. To estimate these topics, we propose a flexible algorithm that integrates traditional topic modeling methods and nonparametric density estimation techniques. A key advantage of this approach is its compatibility with any existing bag-of-words topic modeling method as a plug-in module, requiring no modifications.
Assuming each topic is a beta Hölder smooth intensity measure in the embedded space, we establish the convergence rate of our method. We also derive a minimax lower bound and show that our method attains this rate when beta is in a certain range. Finally, we validate our approach on multiple datasets, demonstrating its advantages over traditional topic modeling techniques in capturing word contexts."
13:50-14:00 CDT
Q&A
14:00-14:05 CDT
Tech Break
14:05-14:35 CDT
Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements
Speaker: Yuejie Chi (Carnegie-Mellon University)
We will discuss a scalable approach to low-rank tensor estimation, ScaledGD, that achieves desirable statistical and computational complexities simultaneously for low-rank tensor completion, sensing, and robust PCA with the Tucker decomposition. Our algorithm highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation for highly ill-conditioned problems, where the iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the underlying symmetry in low-rank tensor factorization.
14:35-14:40 CDT
Q&A
14:45-14:50 CDT
Tech Break
14:50-15:20 CDT
TBA
Speaker: Yuqi Gu (Columbia University)
15:20-15:30 CDT
Q&A
15:30-16:30 CDT
Poster Session + Social Hour
Wednesday, May 7, 2025
8:30-9:00 CDT
Check-In and Breakfast
9:00-9:30 CDT
Generalized Tensor Completion for Noisy Data with Non-Random Missingness
Speaker: Emma Zhang (Emory University)
9:30-9:40 CDT
Q&A
9:40-9:45 CDT
Tech Break
9:45-10:15 CDT
Joint Semi-Symmetric Tensor PCA for Integrating Multi-modal Populations of Networks
Speaker: Genevera Allen (Columbia University)
Multi-modal populations of networks arise in many scenarios including in large-scale multi-modal neuroimaging studies that capture both functional and structural neuroimaging data for thousands of subjects. A major research question in such studies is how functional and structural brain connectivity are related and how they vary across the population. we develop a novel PCA-type framework for integrating multi-modal undirected networks measured on many subjects. Specifically, we arrange these networks as semi-symmetric tensors, where each tensor slice is a symmetric matrix representing a network from an individual subject. We then propose a novel Joint, Integrative Semi-Symmetric Tensor PCA (JisstPCA) model, associated with an efficient iterative algorithm, for jointly finding low-rank representations of two or more networks across the same population of subjects. We establish one-step statistical convergence of our separate low-rank network factors as well as the shared population factors to the true factors, with finite sample statistical error bounds. Through simulation studies and a real data example for integrating multi-subject functional and structural brain connectivity, we illustrate the advantages of our method for finding joint low-rank structures in multi-modal populations of networks. This is joint work with Jiaming Liu, Lili Zheng, and Zhengwu Zhang.
10:15-10:25 CDT
Q&A
10:25-10:55 CDT
Coffee Break
10:55-11:25 CDT
Tensor approaches for single cell 3D genome data analysis
Speaker: Sunduz Keles (University of Wisconsin, Madison)
Emerging single cell technologies that capture three-dimensional genomic interactions (scHi-C) alongside DNA methylation present new opportunities for integrative analysis. We introduce Muscle, a semi-nonnegative joint decomposition method that leverages the inherent tensor structure of scHi-C to unify these modalities, revealing key cell type–specific signals and inter-modality associations. To further address high-dimensional tensor regression challenges arising in the 3D genome context, we propose Sparse Higher Order Partial Least Squares (SHOPS) for variable selection, dimension reduction, and response denoising. Together, these methods underscore the promise of tensor-based approaches for elucidating the interplay between the epigenome and three-dimensional genome organization at the single cell level.
11:25-11:35 CDT
Q&A
11:35-11:40 CDT
Tech Break
11:40-12:10 CDT
TBA
Speaker: Soumendra Lahiri (Washington University in St. Louis)
12:10-12:20 CDT
Q&A
12:20-13:20 CDT
Lunch Break
13:20-13:50 CDT
Tensor Data Analysis and Some Applications in Neuroscience
Speaker: Lexin Li (University of California, Berkeley (UC Berkeley))
Multidimensional arrays, or tensors, are becoming increasingly prevalent in a wide range of scientific applications. In this talk, I will present two case studies from neuroscience, where tensor decomposition proves particularly useful. The first study is a cross-area neuronal spike trains analysis, which we formulate as the problem of regressing a multivariate point process on another multivariate point process. We model the predictor effects through the conditional intensities using a set of basis transferring functions in a convolutional fashion. We then organize the corresponding transferring coefficients in the form of a three-way tensor, and impose the low-rank, sparsity, and subgroup structures on this coefficient tensor. The second study is a multimodal neuroimaging analysis for Alzheimer’s disease, which we formulate as the problem of modeling the correlations of two sets of variables conditioning on the third set of variables. We propose a generalized liquid association analysis method to study such three-way associations. We establish a population dimension reduction model, and transform the problem to sparse decomposition of a three-way tensor.
13:50-14:00 CDT
Q&A
14:00-14:05 CDT
Tech Break
14:05-14:35 CDT
Online Tensor Inference
Speaker: Will Wei Sun (Purdue University)
Recent technological advances have led to contemporary applications that demand real-time processing and analysis of sequentially arriving tensor data. Traditional offline learning, involving the storage and utilization of all data in each computational iteration, becomes impractical for high-dimensional tensor data due to its voluminous size. Furthermore, existing low-rank tensor methods lack the capability for statistical inference in an online fashion, which is essential for real-time predictions and informed decision-making. In this talk, I will introduce a novel online inference framework for low-rank tensor learning. Our approach employs Stochastic Gradient Descent (SGD) to enable efficient real-time data processing without extensive memory requirements, thereby significantly reducing computational demands. We establish a non-asymptotic convergence result for the online low-rank SGD estimator, nearly matches the minimax optimal rate of estimation error in offline models that store all historical data. Building upon this foundation, we propose a simple yet powerful online debiasing approach for sequential statistical inference in low-rank tensor learning. The entire online procedure, covering both estimation and inference, eliminates the need for data splitting or storing historical data, making it suitable for on-the-fly hypothesis testing. Given the sequential nature of our data collection, traditional analyses relying on offline methods and sample splitting are inadequate. In our analysis, we control the sum of constructed super-martingales to ensure estimates along the entire solution path remain within the benign region. Additionally, a novel spectral representation tool is employed to address statistical dependencies among iterative estimates, establishing the desired asymptotic normality.
14:35-14:45 CDT
Q&A
14:45-15:15 CDT
Coffee Break
15:15-16:15 CDT
Group Activity 2
Thursday, May 8, 2025
8:30-9:00 CDT
Check-In and Breakfast
9:00-9:30 CDT
TBA
Speaker: Rajarshi Guhaniyogi (Texas A&M University, College Station)
9:30-9:40 CDT
Q&A
9:40-9:45 CDT
Tech Break
9:45-10:15 CDT
Statistical inference in finite rank tensor regression models
Speaker: Galen Reeves (Duke University)
I will discuss recent work on a general class of high-dimensional factor regression models where each observation depends on interactions between a subset of the unknown parameters as well as covariate information. For any fixed number of interactions, we prove exact formulas for the high-dimensional limit of mutual information and the minimum mean-squared error. Our results provide a unified framework for analyzing a broad class of models, allowing for heteroskedastic noise and asymmetric interactions. This is joint work with Ricardo Rossetti.
10:15-10:25 CDT
Q&A
10:25-10:55 CDT
Coffee Break
10:55-11:25 CDT
Hyperbolic Network Latent Space Model with Learnable Curvature
Speaker: Ji Zhu (University of Michigan)
Network data is ubiquitous in various scientific disciplines, including sociology, economics, and neuroscience. Latent space models are often employed in network data analysis, but the geometric effect of latent space curvature remains a significant, unresolved issue. In this work, we propose a hyperbolic network latent space model with a learnable curvature parameter. We theoretically justify that learning the optimal curvature is essential to minimizing the embedding error across all hyperbolic embedding methods beyond network latent space models. A maximum-likelihood estimation strategy, employing manifold gradient optimization, is developed, and we establish the consistency and convergence rates for the maximum-likelihood estimators, both of which are technically challenging due to the non-linearity and non-convexity of the hyperbolic distance metric. We further demonstrate the geometric effect of latent space curvature and the superior performance of the proposed model through extensive simulation studies and an application using a Facebook friendship network.
11:25-11:30 CDT
Q&A
11:35-11:40 CDT
Tech Break
11:40-12:10 CDT
Can quantum algorithms bridge the statistical-computational gap in random combinatorial optimization?
Speaker: Song Mei (University of California, Berkeley (UC Berkeley))
"Random combinatorial optimization problems often exhibit statistical-computational gaps in classical regimes. For example, classical algorithms fail to achieve near-optimal objective values in general q-spin spin-glass models. They also require a substantially higher signal-to-noise ratio to recover the planted signal in spiked-tensor models. One intriguing question is whether quantum algorithms could bridge such statistical-computational gaps.
In this talk, we study the Quantum Approximate Optimization Algorithm (QAOA), a general-purpose quantum algorithm for combinatorial optimization. We analyze the performance of constant-depth QAOA on the aforementioned problems that exhibit the classical statistical-computational gaps. Specifically, in the q-spin spin glass models, we characterize the energy levels achieved by QAOA, given by a set of saddle point equations. In the spiked-tensor model, we calculate the asymptotic overlap between the QAOA state and the underlying signal, which exhibits an intriguing sine-Gaussian law. Despite these insights, our findings unfortunately reveal that arbitrary constant-depth QAOA does not surpass classical algorithms in these problems. This suggests that demonstrating the potential quantum advantage of QAOA requires an analysis beyond sub-polynomial algorithmic depth."
12:10-12:20 CDT
Q&A
12:20-13:20 CDT
Lunch Break
13:20-13:50 CDT
Conformalized Tensor Regression for Fusion-Agnostic Multiview Learning
Speaker: Himel Mallick (Cornell University)
We present a fusion-agnostic framework for multiview learning that combines tensor regression with conformal prediction to deliver accurate predictions and valid, distribution-free uncertainty estimates. By modeling multiview data as structured tensors, our method captures complex dependencies across views and supports early, late, or intermediate fusion strategies. Extensive experiments on synthetic and real-world datasets show that our approach improves both predictive performance and reliability compared to non-conformal and unstructured alternatives.
13:50-14:00 CDT
Q&A
14:00-14:05 CDT
Tech Break
14:05-14:35 CDT
TBA
Speaker: Joshua Agterberg (University of Illinois at Urbana-Champaign)
14:35-14:45 CDT
Q&A
14:45-15:15 CDT
Coffee Break
15:15-15:45 CDT
TBA
Speaker: Yuchen Zhou (University of Illinois at Urbana-Champaign)
15:45-15:50 CDT
Q&A
Friday, May 9, 2025
8:30-9:00 CDT
Check-In and Breakfast
9:00-9:30 CDT
TBA
Speaker: Aaron Schein (University of Chicago)
9:30-9:40 CDT
Q&A
9:40-9:45 CDT
Tech Break
9:45-10:15 CDT
TBA
Speaker: Pixu Shi (Duke University)
10:15-10:25 CDT
Q&A
10:25-10:55 CDT
Coffee Break
10:55-11:25 CDT
TBA
Speaker: Yuchen Wu (University of Pennsylvania)
11:25-11:35 CDT
Q&A
11:35-12:05 CDT
Tensor approach to clustering in the Diverse Multilayer Random Graph model
Speaker: Marianna Pensky (University of Central Florida)
IMSI is committed to making all of our programs and events inclusive and accessible. Contact [email protected] to request disability-related accommodations.
In order to register for this workshop, you must have an IMSI account and be logged in. Please use one of the buttons below to login or create an account.