Household surveys are a widely-used tool for obtaining information about a population of people. A sample of households is selected followed by a sample of people within selected households. Households exhibit structure with variables measured on people in the same household often being dependent. Household sizes vary significantly and the strength of dependencies within a household may depend on its size. Traditional sample design and estimation methods ignore these dependencies. Methodologies which explicitly allow for the dependencies which may arise within households are developed in four areas: • Estimating the design effects of standard estimators, for survey design; • Constructing new estimators to exploit dependencies within households; • Selecting a set of auxiliary variables to use in regression estimation; • Allocating the sample sizes of households and of people within households. In each case, new methods will be developed which allow for the population structure of people within households. The new methods will be compared theoretically and numerically to existing methods to show whether, and under what conditions, it is worthwhile explicitly allowing for this population structure. This thesis is concerned with the sampling error of estimators of population totals; that is, the error due to selecting only a sample and not the whole population of people. The model-assisted framework will be used. The thesis finds that the population structure of people within households can be exploited to give several useful innovations. It is shown that, in estimating the variance at the sample design stage, the variation of household size must be considered. This variation is ignored in existing methods for estimating the design effect, and a more accurate method is developed. It is found that minor improvements can be made to standard estimators of total by considering within-household dependencies. An “integrated weighting” method, based on a linear contextual model, which has important practical advantages is found to often have slightly lower variance than non-integrated methods, contrary to common belief. Existing criteria for selecting which auxiliary variables to use in regression estimation are extended to the case of two-stage sampling, and applied to household surveys. In most household surveys, either one person or all people are selected from each selected household. More general designs, in which the number of people selected is a function of the number of people in the household, are developed. The fact that the number of people in a household is small leads to some novel and efficient sample designs and estimators.
Available at: http://works.bepress.com/robert_clark/6/