Assistant Professor Miaoyan Wang received the prestigious NSF CAREER Award Higher-order tensor datasets are rising ubiquitously in modern data science applications. Tensor provides an effective representation of data structure that classical low-order methods fail to capture. However, empirical success has uncovered a myriad of new challenges.
Miaoyan Wang, an assistant professor in the Department of Statistics at UW-Madison, has received a prestigious CAREER award — as one out of 6 recipients in 2022 from Statistics Panel in National Science Foundation (NSF). The award will allow Miaoyan to develop a suite of statistical learning theory, efficient algorithms, and data-driven solutions for high-dimensional tensor problems.
“I am very excited to push tensor learning theory to new realms. I also feel lucky to receive this funding support at my first attempt of NSF CAREER. My colleagues and staffs in the department have provided me tremendous support, and I’d like to take this opportunity to thank them.”
Miaoyan Wang plans to investigate the fundamental computational-statistical tradeoffs for a range of tensor problems, including, but not limited to, low-rankness, non-negativity, block-structure, and smoothness. The new framework will fill in the gap between statistical oracles and the empirical algorithms for addressing higher-order high-dimensional tensor problems. The research will be applied to a variety of data problems, such as classification of brain connectivity data, pattern detection in recommendation systems, and omics data integration.
BIO: Miaoyan Wang has been a UW-Madison faculty member since 2018. Her research is in machine learning theory, nonparametric statistics, higher-order tensors, and applications to genetics. Her interdisciplinary research efforts have been reflected in her training. Prior to UW-Madison, she was a postdoc at the Department of EECS at UC Berkeley and a Simons Math+X
postdoc at University of Pennsylvania. She received a PhD in Statistics from the University of Chicago. She has won multiple Best Student Paper Awards (with her as advisor) from American
Statistical Association, the Madison Teaching and Learning Excellence Fellow, and several prestigious young researcher awards in statistics, machine learning, and genetics.
Rohe and Zeng Present discussion paper at the Royal Statistical Society in London.
Can you identify the object I’m holding? It is a common household item, but from an uncommon angle. Keep reading to find out what it is and how this question is related to principal component analysis (PCA) and my recent work with Muzhe Zeng that we presented to a discussion meeting of the Royal Statistical Society in London.
What is the point of PCA? Why do people use it?
I love eigenvectors and PCA because they help us identify “the shape” of high dimensional data, things like social networks, text, and financial time series. PCA shows us the “widest dimensions” of our data and in all of the applications that I have ever worked on, these dimensions turn out to be hugely interesting. Why is that?
In the image, I’ve not shown you the “widest dimension” of this household object. Because I’ve concealed this dimension with my hand, it is very hard to identify. Instead, you see a very colorful side of this… kitchen whisk. Once you see its widest dimension, it is easy to identify. Many household items are this way; you can make them look funny by hiding the longest dimension behind your hand.
Perhaps our data is the same way? If we are trying to visualize our data, we should look at these wide dimensions! These wide dimensions can help identify the most important patterns.
The next two figures explore two disparate types of data. One is financial time series; each dot is an asset listed in the SP500 and the data (not displayed) are daily returns over the last decade. The other data source is a social network; each dot is an academic journal and journals A and B are “friends” if the articles in A cite the articles in B (and vice versa).
The first figure gives the financial time series. The second figure gives the social network. Notice the similarities in the overall shape…
[Figure 2 and 3]
After you have visualized these “widest dimensions,” the next task in PCA is to provide an interpretation; what do these dimensions correspond to?
“Factor rotations” are a classical tool to help interpret these dimensions. “Factor rotation” is a fancy way of saying that we redraw the axes to align with the data. Notice how both figures above have “radial streaking”? You can choose the axes to run through these streaks. It turns out, this makes the axes/dimensions much easier to interpret. In the financial time series, each axis/dimension becomes an “industry” (e.g. tech, energy, consumer durables, etc). In the journal network, each axis/dimension becomes an academic area (e.g. medicine, engineering, mathematical sciences, etc). Then, we can look at each point (asset or journal) and see where it falls along these new dimensions.
Here is what the PCs look like after the “varimax” factor rotation.
[Figure 3 and 4]
In both cases, the points (assets and journals) now fall along the axes. The structure becomes so much easier to see! Factor rotations are hugely popular for making the PCs easier to interpret. The most popular rotation is called varimax and it is already loaded into R; you can just type? varimax in the R console and get the help documentation.
Unfortunately, factor rotations are historically controversial. There is a popular statistical theory that the factor rotation is statistically “undentifiable”. But they work so well! And they are so popular! How can this be?
This is where my research with Muzhe Zeng begins. In our recent work, we give a modern statistical theory for PCA and varimax. We show that PCA+varimax, two classical techniques, can be combined to estimate a broad class of “semi-parametric factor models.” Our work will soon appear in print at the Journal of the Royal Statistical Society (methodological series) with discussion from other researchers.
In ongoing work, I am developing a new way of thinking about PCA, a way that I think is more powerful and useful. It is different from the traditional thinking on the topic. If you want to
follow along as I work on this, I post about these ideas on twitter @karlrohe.