NJIT Applied Mathematics Colloquium
Friday, November 4, 2011, 11:30am
Cullimore Lecture Hall II
New Jersey Institute of Technology
Coping with High Dimensionality in Massive Datasets: An Overview
Jon Kettenring
Drew University
A massive dataset is characterized by its size and complexity. In its
most basic form, such a dataset can be represented as a collection of n
observations on p variables. Aggravation or even impasse can result if
either number is huge. The more difficult challenge is usually
associated with the case of very high dimensionality or ‘big
p’. There is a fast growing literature on how to handle such
challenges, but most of it is in a supervised learning context
involving a specific objective function, as in regression or
classification. Much less is known about effective strategies for
unsupervised and exploratory data analytic activities. The goal of this
talk is to give a flavor of recent research on dimensionality reduction
and variable selection in highdimensional settings. Examples of real
data problems and advances in methodology will be described. Methods
associated with principal components analysis, regression analysis, and
cluster analysis will be emphasized.
References: WIREs Computational Statistics, Vol. 1, 2009, 2532 and Vol. 3,
2011, 95103
