Statistics Colloquium

THE DEPARTMENT OF MATHEMATICAL SCIENCES AND
THE CENTER FOR APPLIED MATHEMATICS AND STATISTICS,
NEW JERSEY INSTITUTE OF TECHNOLOGY

11:30 AM to 1:00 PM
Wednesday, December 18, 2002

Cullimore Hall Room 611
New Jersey Institute of Technology





Zhanyun Zhao, Ph.D.

The Wharton School, University of Pennsylvania

" Analysis of the Dual System Estimate in the 2000 Census "

The Accuracy and Coverage Evaluation (A.C.E.) study was conducted to measure the overall and differential coverage of the U.S. population in Census 2000. The A.C.E. involves two parts: E-sample from Census records and P-sample from an address file. Dual system estimate is used to obtain an estimate for undercount, and it is conducted at post-strata level defined by homogenous groups based on geography, race/Hispanic origin, housing tenure, age/sex, etc. Based on both E-sample and P-sample information, a synthetic dual-system estimate is produced for any geographic level. There are two main assumptions in this process. First, the E-sample and P-sample are independent of each other, i.e., there is no correlation bias. Second, the undercount rates are constant within post-strata across geographic areas, i.e., post-strata are actually homogeneous. We set up a joint Poisson model to test these hypotheses. By a procedure involving the bootstrap resampling of the log-likelihood ratio statistic and the EM algorithm, both hypotheses are rejected.

The mandated purpose of the U.S. census is to determine how many congressional seats can be allocated to each state. The apportionment of the congressional seats is based on the population share. A state's population share is normally defined as its percentage of the national total. We examine whether there are clear-cut differences beyond random variability between the census count shares and synthetic dual system estimate shares. We use a bootstrap method for the share comparison and the answer is YES. The Census Bureau uses a jackknife method to measure the variance of share difference. We also investigate the differences in variance using jackknife and bootstrap methods.

We also design alternative formulae for synthetic dual system estimation. We come up with two alternative formulae. One is inspired by the logic underlying the census formula; the other uses imputation rates to estimate the undercount rates for states within post-strata. Both alternative formulae yield the same expectations as the census formula if the imputation rates are distributed homogeneously. In reality where the imputation rates are not homogeneous, we find the census formula to be a compromise of the two alternatives. We show that each formula has its own merit, and no one dominates another.