Statistics Research Focus Areas


Sunil Dhar

The primary focus of my research is:

 

Modeling and Statistical Inference with Applications

 

I have contributed extensively to the fields of multivariate frequency count and reliability models leading to the derivation of an innovative model (Dhar 1995). The frequency count model has proved its wide applicability in the use of linear, log-linear and logit models by introducing a newly derived generalized inverse sampling scheme. Estimators with desirable properties in these and other models have also been obtained, thus giving statistics a wider scope of application. These results with my student can be found in Soumi and Dhar (2008). I have developed new bivariate geometric model with application to system reliability and sports data analysis (Dhar and Balaji 2006, Dhar 1998, Dhar 2003, Dhar 1998). Research results in multivariate geometric models have involved the environmental factor (Basu and Dhar 1995). Other models of dependent groups of component and their life probability have also been developed in (Dhar and Jiang 2000, Dhar 1996, Dhar and Jiang 1995).

 

I have developed robust and efficient minimum distance estimators, under the additive outlier models published in Annals of Statistics (also found in Dhar 1992, 1990), especially the significant optimization tool, for the general class of sums of absolute multivariate linear functionals, which works even when linear programming methods fail, (Journal of American Statistical Association paper, Dhar March 1993 ). Not only does this improved method have the ability to enrich all fields of science that depend on the optimization that this tool can provide, this technique has also allowed me to establish the existence of minimum distance estimators (Dhar 1990 - 1993).

 

Aside from developing new tools and models to improve the overall efficiency of various aspects of statistics, I have been heavily involved in research that has contributed to up and coming fields within and well beyond the range of purely mathematical statistics. Over the past several years I have been intimately involved in research in the area of pharmaceutical statistics. My research contributions in this field were applicable to various therapeutic areas in several clinical trials (Banerjee, Dhar, Kianifard, 2009, Chen, Kianifard, Dhar, 2006, Dhar, Kianifard 2006).

 

My research has also made contributions in many areas of biostatistics, such as pharmacokinetic models of propofol, cell biology, diabetic patient care, and Chronic Fatigue Syndrome (Atlas and Dhar, 2006, 2009; Wu, Yan, Depre, Dhar, et al. 2009; Hedhli, Lizano, Hong, Fritzky, Dhar et al. 2008; Zhang, Safford, Ottenweller, Hawley, Repke, Burgess, Dhar, et al. 2000).

 

Logistic regression models have been effectively used for analyzing transportation safety data (Munley, Daniel and Dhar 2004) as well as longitudinal assessment of neuropsychological functioning, psychiatric status, and disability and employment status of patients with Chronic Fatigue Syndrome (CFS) (Tiersky, DeLuca, Hill, Dhar et al. 2001).

 

In addition to my extensive research in these varied fields, I have been the coordinator of Statistics Consulting Lab at NJIT for seven years and continue to be greatly involved in it. I have also been a Statistical Consultant in the pharmaceutical industry, various medical research institutions, as well as the health care sector and have thus gained great experience and versatility as well as new found relevance and application for my knowledge in the statistical area in current times.

 

 


 

Wenge Guo

Wenge Guo's principle research is:

 

Large-scale Multiple Testing and High-dimensional Inference

 

A number of new ``high throughput" scientific devices such as microarrays, satellite imagers, proteomic chips and fMRI scanners have been developed in the last decade. Such devices permit thousands of different variables to be examined in a single experimental run, thus generating a large number of massive data sets. These large-scale data sets usually involve hundreds of thousands or even millions of variables but only a few hundred samples. Motivated by the demands for statistical analysis of such high dimension and low sample size data, large-scale multiple testing and high-dimensional inference, especially development and theoretical study of False Discovery Rate (FDR) controlling methodology has become a very active research area. The resulting theories and methodologies have been applied to a number of applications in fields such as biological/medical sciences, engineering, finance, marketing etc. Consequently, Guo's research interest in such areas affords him a great opportunity to not only develop theory and methodology, but also to apply them to solve real world problems.

 

In Guo's work, he derived new theory and methods for the FDR and familywise error rate (FWER) and generalized error controls such as generalized FDR (k-FDR) and generalized FWER (k-FWER) under various conditions that are appropriate for applications. He devised an efficient algorithm for solving the practical computational problems arising in large-scale multiple testing. Also, he developed the mixed directional FDR controlling methodology for the analysis of time-course and dose-response microarray data. In the future, he will continue his research on theory and methods of large-scale multiple testing and high-dimensional inference and extend the framework of FDR controlling methodology to satisfy practical demands by exploiting some available structure information. In addition to theoretical and methodological study, he will also be interested in applying multiple testing methodologies to different biomedical areas such as microarray data analysis, design and analysis of clinical trials and high throughput screening assay.

 

 


Ji Meng Loh

 

Ji Meng ’s research interest is:

 

Spatial Statistics.

 

This involves working with data that contains information about the locations from which the data are collected. Statistical analyses make use of the correlation inherent in the spatial data to improve statistical inference.

 

Ji Meng’s particular area of expertise is in the analysis of spatial point patterns. His work has included analyzing the degree of clustering of astronomical gas clouds, examining the relation between locations of fast food restaurants and schools in New York City, using cellular data for urban planning, residual analysis of models fit to fMRI data to identify mismodeling, and anomaly detection of spatial point patterns in order to identify unusual clusters of data points.

 

Ji Meng is also working on quantifying and displaying differences between two sets of mapped data, understanding the effect of data quality issues on statistical inference.

 


 

Sundar Subramanian

Sundar Subramanian’s principal research is:

 

Statistical Inference from Censored Time-to-event-data.

 

In cancer clinical trials the time to a terminal event such as mortality due to a specific cancer cannot be always observed because a patient is lost to follow-up and/or a competing terminal event preempts the occurrence of the event of interest. The competing causes induce what is called dependent censoring, posing new statistical challenges for the analysis of such data. Frequently, however, competing risks data are subject to both independent and dependent censoring and, in some cases, the cause of failure may be missing. Subramanian’s current focus is on semi-parametric models for analyzing competing risks data, with the main research thrust being investigations into model selection, model diagnostics, robustness studies giving insight into model misspecification, and extensions addressing missing cause of failure (censoring) information.

 

His other ongoing research interests are 1) median regression with focus on curse of dimensionality targeted estimation and inference for the parameters, including checking for model adequacy; 2) multiple imputations based estimation of survival functions; 3) resampling methods for model selection and bandwidth computation; and 4) theoretical and empirical investigations into effects of model misspecification.

 

His investigations involve study of the large sample behavior of estimators using techniques from counting processes and martingales, empirical processes, kernel estimation, and information bound theory. The procedures have strong theoretical basis and find applications in Biostatistics.

 

 


 

Antai Wang

 Antai Wang’s principal research is:

Multivariate Survival Analysis and High Dimensional Data Analysis.

My main research interests are multivariate survival analysis and high dimensional data analysis. My main contribution is to model multivariate survival data using copula models. Copula models are important statistical models that can be used to model the dependence structure of two random variables. It has been widely used in paired data (such as the twin data) or correlated data analysis (such as the competing risk models) in medical research. I have developed model fitting/estimation and model checking procedures for Archimedean copula models for paired survival data and proved nonidentifiability property of Archimdean copula models for dependent censored data. During the period I was working in the Department of Biostatistics, Bioinformatics and Biomathematics and Lombardi Comprehensive Cancer Center (LCCC) at Georgetown University and also in the Department of Biostatistics and Herbert Irving Comprehensive Cancer Center (HICCC) at Columbia University, I had been involved in the analysis of the microarray data from both the theoretical and practical point of view in cooperation with research investigators at the HICCC and LCCC. The main challenge to all the researchers in this area is that there are many more genes (variables) than microarray chip samples (the so called n<<p situation, where n represents sample size and p represents number of genes/variables). In particular, how can one select the most informative genes from such a large pool of genes is a challenging problem. To address this problem, Dr. Gehan and I (2005) proposed a gene selection strategy that can be used to select the most informative genes that keep the original data structure. I then collaborated with Dr. Robert Clarke’s lab to analyze the estrogen receptor alpha (ER) positive breast tumors and breast cancer cell lines using the principal component analysis tools to demonstrate the similarity between them (Zhu, Wang, Clarke et. al 2006a, 2006b). I have also joined a team of researchers at Georgetown University to write a review paper to discuss the properties of high-dimensional data spaces that arise in genomic and proteomic studies and the challenges they pose for data analysis and interpretation (Clarke, Wang, et. al 2008). I have helped many medical investigators with the statistical design in their grant applications and data analyses arising from their research investigations. Because of my effort, many of grant proposals did receive the funding eventually.  Generally speaking, my contributions to the various grant proposals have been crucial in helping medical investigators get funding and publish research papers. With my continuous effort and collaborations, numerous papers have published in the areas such as Epidemiology, Cancer Research and Basic Science.