Statistics Seminar

Title: Convergence and Concentration of Empirical Measures under Wasserstein Distance Presenter: Jing Lei Abstract: We provide upper bounds of the expected Wasserstein distance between a probability measure and its empirical version, generalizing recent results for …

Statistics Seminar

Title: Quantitative Ethnography: Epistemological and Methodological Opportunities and Challenges of Big Data in the Social Sciences Presenter: David Williamson Shaffer Abstract: In the age of Big Data, we have more information than ever about what …

Statistics Seminar – Canziani

Title: Unsupervised Deep Learning: Autoencoders and Generative Adversarial Nets Presenter: Alfredo Canziani Abstract: The brain has about 10^14 synapses and we only live for about 10^9 seconds. Even just considering binary synapses, a learning algorithm …

Statistics Seminar

Title: Statistical analysis and spectral methods for signal-plus-noise matrix models Presenter: Joshua Cape Abstract:Estimating eigenvectors and principal subspaces is of fundamental importance for numerous problems in statistics, data science, and network analysis, including covariance matrix …

Proper Extraction and Representation of Low Rank Modules in Gene Expression Data Studies

Abstract: Reliable biological interpretation from gene expression
data collected from cancer tissue samples are often challenged by
two aspects: (1) the multiple signals coming from diverse cell
components within each tissue; and (2) the heterogenous patient
sub-groups. For (1), the decomposition of convoluted signals
requires careful extraction and representations of the low-rank
modules of high-dimensional data matrices. We developed a semisupervised
approach in synergy with a constrained non-negative
matrix decomposition approach to identify the diverse signal
intensities contributed by the cell components in the tissue. For (2),
considering the marked heterogeneity among samples each
measured with a high-dimensional feature sets, we developed a biclustering
based subspace clustering (SSC) algorithm, where
different from traditional SSC algorithms, the samples are clustered
for many times, and each time, the clustering is done on a subset of
attributes weighted differently for each cluster. Our analysis
identified novel cell type specific functions and cell-cell interactions
in different cancer types. Experimental validations by using CRISPR
knockout has demonstrated key genes expressed by cancer and
other cells that contribute to the immune evasion and drug
resistance in colorectal cancer.