NJIT Applied Mathematics Colloquium

Friday, November 4, 2011, 11:30am

Cullimore Lecture Hall II
New Jersey Institute of Technology



Coping with High Dimensionality in Massive Datasets: An Overview

Jon Kettenring

Drew University

A massive dataset is characterized by its size and complexity. In its most basic form, such a dataset can be represented as a collection of n observations on p variables. Aggravation or even impasse can result if either number is huge. The more difficult challenge is usually associated with the case of very high dimensionality or ‘big p’. There is a fast growing literature on how to handle such challenges, but most of it is in a supervised learning context involving a specific objective function, as in regression or classification. Much less is known about effective strategies for unsupervised and exploratory data analytic activities. The goal of this talk is to give a flavor of recent research on dimensionality reduction and variable selection in high-dimensional settings. Examples of real data problems and advances in methodology will be described. Methods associated with principal components analysis, regression analysis, and cluster analysis will be emphasized.

References: WIREs Computational Statistics, Vol. 1, 2009, 25-32 and Vol. 3,
2011, 95-103