NJIT Applied Mathematics Colloquium
Friday, November 4, 2011, 11:30am
Cullimore Lecture Hall II
New Jersey Institute of Technology
Coping with High Dimensionality in Massive Datasets: An Overview
Jon Kettenring
Drew University
A massive dataset is characterized by its size and complexity. In its
most basic form, such a dataset can be represented as a collection of n
observations on p variables. Aggravation or even impasse can result if
either number is huge. The more difficult challenge is usually
associated with the case of very high dimensionality or ‘big
p’. There is a fast growing literature on how to handle such
challenges, but most of it is in a supervised learning context
involving a specific objective function, as in regression or
classification. Much less is known about effective strategies for
unsupervised and exploratory data analytic activities. The goal of this
talk is to give a flavor of recent research on dimensionality reduction
and variable selection in high-dimensional settings. Examples of real
data problems and advances in methodology will be described. Methods
associated with principal components analysis, regression analysis, and
cluster analysis will be emphasized.
References: WIREs Computational Statistics, Vol. 1, 2009, 25-32 and Vol. 3,
2011, 95-103
|