Confidence Interval Estimation in R-DAS

Background— Roughly 25 years ago, the United States National Institute on Drug Abuse (US NIDA) initiated the creation of public use datasets for its National Household Survey on Drug Abuse, since re-named the National Survey on Drug Use and Health (NSDUH). The Substance Abuse and Mental Health Services Administration (SAMHSA), which assumed responsibility for the survey in 1992, has continued and expanded this effort to make the survey data available to researchers. During 2012, SAMHSA created a “Restricted-Use Data Analysis System” (R-DAS) to provide researchers with the capability to create tabulations using restricted NSDUH variables not otherwise available on the public-use files. Methods— This methods focused article is intended to help potential users of R-DAS-like online data analysis systems by (i) clarifying statistical issues involving approximation of confidence intervals (CI), (ii) providing a way to estimate CI when tabular output is suppressed with an ‘error message’ based on confidentiality restrictions, and (iii) showing how to make pairwise comparisons of estimates not otherwise allowed. Results— For illustration purposes, some empirical estimates are presented on a topic of continuing of public health concern in the US namely, extra-medical use of pain relievers (generally opioids), where the drugs are being used to get high and otherwise outside the boundaries intended by prescribing clinicians. Conclusion— The R-DAS makes it possible to derive state-level estimates of male-female and age-related differences in incidence of extra-medical prescription pain reliever (EMPPR) use, not This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. previously reported in peer-reviewed articles, and not available without research approaches described here.


Background
In the background of this methodological article is a principle of data sharing expressed in policies of United States agencies such as the National Institute on Drug Abuse and the Substance Abuse and Mental Health Services Administration (US; NIDA; SAMHSA) when government funds have been used to support completion of public health research projects, particularly those on the scale of epidemiological field surveys. On occasion, public use files (PUF) are shared for flexible end-user analyses of individual level 'microdata,' often made available as downloadable "flat file" raw data or as corresponding analysis-ready datasets for commercial software packages (e.g.,.xls format for Microsoft Excel;. dta format for Stata). In some instances, agencies have commissioned software programming required for "online analysis systems" so that potential users can log in to complete basic online analyses of the unweighted or weighted data. The "Restricted-use Data Analysis System" (R-DAS) and the "Survey Documentation and Analysis" (SDA) for the National Surveys on Drug Use and Health (NSDUH) in the United States are two online analysis systems that are most pertinent here, although many of this article's methodological topics pertain to PUF data analyses from complex epidemiological sample surveys generally.
As might be imagined, there is a substantial potential for erroneous results when PUF analyses are conducted by investigators who lack familiarity with statistical methods for epidemiological field survey data and who might not have seen excellent textbooks on applied survey analysis (e.g., Heeringa et al., 2010). One frequent source of error involves statistical inference requiring use of analysis weights -for example, as required when the study design has more complexity than can be achieved by simple random sampling from a census roster of all study population members. As an example, analysis weights may be required because the estimated population parameters need to be adjusted for a potential violation of an "independent and identically distributed" (i.i.d.) assumption. In epidemiological field surveys, the i.i.d. assumption can be violated and analysis weights may be needed because of stratification and cluster sampling with unequal probabilities of selection and for other reasons. Several analysis weights may be encountered. For example, there may be a weight based on the inverse of the probability of selection at one or more stages of sampling (which we can call an 'IPS' weight or a 'base weight'). Sometimes there is a weight intended to make adjustments for missing data. In many surveys conducted since the 1940s, there is a 'poststratification adjustment factor' (PSAF) derived from survey research ideas introduced by Deming et al., 1940, who noted how survey estimates might be improved, after data collection, via post-survey stratifications and adjustments to marginal totals known from a more accurate source (e.g., to accounting for missing data, variations survey participation levels, and other aspects of survey design).
To illustrate the idea of an 'IPS' or 'base' weight, consider a hypothetical simple one-stage survey set up such that all census-derived community dwelling units (DU) are included, but only one DU occupant is sampled for assessment. In a DU with one occupant, the selection probability is 1.0; the weight is the inverse of 1.0 = 1.0. In a DU with just two occupants, selection probability is 50%; IPS or base weight is 1/(0.5). For DU with three occupants, this weight is the inverse of 1/3, and so on.
In contrast to what a research team might specify as IPS or base weights in advance of the study, the PSAF is created after survey fieldwork and a comparison of what is thought to be known about the study population (e.g., based on a 100% census sample). For example, the most recent decennial census of the non-institutionalized non-military US population age 12 years and older yields a male-female ratio of 151,781,326 to 156,964,212 (United States Census Bureau, 2011). Because not all eligible women participate in a survey (due to sample frame non-coverage or omission), when the survey-based number of females is compared to the known census count of women in a stratum of interest, the survey-based count often falls short of what the census states. Also, because sampled women often participate in surveys more than sampled men, the survey-based count of men can show a greater shortfall when compared to known census counts. A post-survey stratification analysis displays this type of survey error in its comparison of the survey-based estimates for each population stratum relative to the known census population count. A PSAF involves correction multiplier to bring survey counts up to census when a goal is to produce estimates for the total study population. Because this type of survey error often varies across cells in a multiway contingency table formed by census variables like sex, age, and race-ethnicity, many surveys create a PSAF as a multiplier for each cell of that multiway table.
Investigators unfamiliar with applied survey analysis principles may not appreciate that the PSAF multiplier can be made to serve multiple valuable functions, but it is not uniformly required nor is it always useful. For example, the PSAF actually can produce misleading study estimates when the estimation task is focused on subgroups within a population, and may be based on a fallible assumption about within-subgroup homogeneity. In contrast, when the estimation task is to produce estimated counts for the entire population, the use of a PSAF can be essential. For these reasons, 'best practices' for an optimal PUF for data sharing will include creation of separate variables and codebook documentation for the survey's IPS weight or weights as they might vary across one or more stages of sampling and for any PSAF created by the survey team, as well as a summary analysis weight that combines the IPS weight with the PSA factor, along with any other adjustments as judged to be useful by the PUF creators.
At present, the SAMHSA R-DAS and SDA provide only summary analysis weights. More precisely, the R-DAS provides a set of summary analysis weights that can be chosen by the end-user based on decisions about which single summary estimate is most useful. For example, for a relatively rare drug using behavior (e.g., injecting methamphetamine), a single estimate based a summary analysis weight for 8-or 10-years of NSDUH data might be most useful. In contrast, recent summary estimates have been used to study US trends in incidence of crack-cocaine use, with a display of one trend line based on 1-year SDA summary analysis weights, and with another trend line based on 2-year R-DAS summary analysis weights (Parker and Anthony, 2014).
It also is noteworthy that the SDA gives investigators access to downloadable unweighted datasets and the corresponding summary analysis weight, allowing direct comparison of unweighted and weighted estimates. In contrast, the R-DAS is strictly for online tabular analyses; it yields weighted analyses with no straightforward way to know the actual unweighted number of study participants in the analysis sample. In circumstances of this type, one might ask why the R-DAS was created, and why investigators would choose to use R-DAS versus SDA. The answer rests on a consideration of survey years, as well as an issue of accessibility to NSDUH study participants and NSDUH variables.
The SDA datasets are from national field surveys of drug use conducted since the late 1970s, whereas the R-DAS are restricted to national surveys conducted only since 2002. For both SDA and R-DAS, confidentiality and privacy protections required removal of some participants from the survey datasets, but the participants removed from the SDA datasets are not always the same individuals who have been removed from the R-DAS datasets.
With respect to variables available for analysis, the R-DAS and the SDA portals give access to different selections. To illustrate, investigators interested in monthly or seasonal variations in newly incident drug use must turn to the SDA, whereas investigators interested in state or sub-state geographic variations must turn to the R-DAS. R-DAS includes variables for each US state, but no month/season variable; SDA includes month variables, but includes no state variable.
SDA estimates from each year can be combined in meta-analysis fashion, with each year's survey treated as an independent replication (DeAndrea et al., 2013). In some contexts, the SDA meta-analysis summary estimate has been found to perform as well as or better than an alternative approach that involves pooling data across years (DeAndrea et al., 2013). Similarly, a series of R-DAS 2-year estimates can be summarized in meta-analytic fashion (Mohammed and Anthony, 2014), and the R-DAS 8-year and 10-year summary analysis weights are available when the goal is to optimize estimates based across-year pooling of R-DAS data (e.g., see Seedall and Anthony, 2013).
Indubitably valuable, the R-DAS remains a "work in progress" for which there is no extensive online documentation. Unlike SDA, with considerable online help support (Online Help for Analysis Programs-SDA 3.5) and a NSDUH statistical inference report that provides details about a NSDUH statistical inference procedures (Aldworth et al., 2011), the FAQs page for R-DAS is more constrained (Help with the Restricted-use Data Analysis System (R-DAS)). For instance, R-DAS documentation does not state whether the 95% default confidence intervals for estimated percentages are produced using normal approximations to the binomial distribution (i.e., yielding standard symmetric CIs), or by converting a standard error and the estimate of each percentage to a natural logarithm scale (i.e., the logit transformation). Additionally, if the unweighted number of observations in any cell in the R-DAS table falls below some unstated threshold value, R-DAS delivers this system response: "Error Your analysis has returned one or more errors: To preserve confidentiality, tables cannot be displayed when the number of observations in any cell in the table is too low." That is, the output is suppressed due to confidentiality restrictions. Also, certain statistical procedures are not available with the current version of R-DAS. For example, R-DAS includes no approach for significance testing or derivation of p-values when the goal is statistical inference about pairwise comparisons of percentages.
Finally, via anonymous reviews of this article, we were encouraged to mention a new NSDUH "Data Portal" and an arrangement that allows investigators to apply for access to microdata files with fewer restrictions on variable selection. We have not yet been approved for this privilege, and cannot speak from experience, but it is possible that the value of approaches described in this article will be greater for R-DAS users than for users of the new data portal.

This Paper's Focus
With a methods focus, this papers seeks to fill some gaps in R-DAS documentation based our own end-user experiences, and to provide a simple alternative to current R-DAS research approaches, with an illustration that sheds light on several facets of the epidemiology of extra-medical prescription pain reliever use in the US. Using R-DAS data to look state by state within the US, we estimate (1) male-female differences and (2) age-related differences in the risk of becoming a newly incident user of these drug compounds. This illustration clarifies state-by-state variations in incidence of becoming a newly incident extra-medical user of prescription pain relievers (mainly opioid drugs). Here, our measure of incidence has a numerator based on the estimated number of newly incident users during a specified time interval, and its denominator is based on the estimated size of the 'at risk' population. That is, this denominator excludes individuals who had started extra-medical use in the past, and includes only those individuals 'at risk' of starting for the first time. Consistent with prior National Comorbidity Survey articles (e.g. Anthony et al., 1994), and many other more recent articles based on NSDUH data (e.g., Meier et al., 2012;DeAndrea et al., 2013;Seedall and Anthony, 2013), the term "extra-medical" refers to using the drug outside of boundaries intended by the prescribing clinician (e.g., "to get high" or for other such feelings).
With this methods focus, this article has been written to help potential users of R-DAS and other online data analysis systems by (i) clarifying statistical issues involving approximation of confidence intervals (CI), (ii) providing a way to estimate CI when tabular output is suppressed with an 'error message' based on an R-DAS confidentiality restriction, and (iii) showing how to make pairwise comparisons of estimates not otherwise allowed in R-DAS. The specific substantive aims pertain to state-specific incidence estimates just mentioned, including estimates for middle-aged and elderly US residents that otherwise are suppressed with a R-DAS confidentiality error message. We anticipate that many DAD readers will be interested in how this particular obstacle can be overcome.
Due to the methods focus of this article, we do not provide a detailed overview of prior important contributions in this substantive research domain. Fortunately, in addition to the background literature we have summarized in our own research group's past contributions of epidemiological evidence on prescription pain relievers, as well as male-female and age- related differences in incidence for extra-medical drug use (e.g., Meier et al., 2012;DeAndrea et al., 2013;Seedall and Anthony, 2013), there are new reviews from other research teams (e.g., Dowling et al., 2006;Martins et al., 2009;Manchikanti et al., 2012;Volkow et al., 2014). In addition to US estimates, there is a growing body of contributions, including Germany, the United Kingdom, Oceania, and Asia Roxburgh et al., 2011;Schubert et al., 2013;Radbruch et al., 2013;Hawkes, 2013;Nicholas et al., 2013;Degenhardt et al., 2014).

The Epidemiological Field Surveys
The NSDUH have been described in many past articles of this journal, such as those cited above. There also is extensive online documentation about NSDUH study methods and statistical inference approaches (e.g., Aldworth et al. 2011). In brief, the NSDUH is conducted annually with field operations that involve multistage area sampling and recruitment. As for materials for this study, each year from 2002 through 2011, the yield has been large scale nationally representative samples of community-dwelling (noninstitutionalized) U.S. citizens, age 12 and older, such that this survey no longer qualifies as a "household survey" per se, even though it recently has been characterized as such (https:// apha.confex.com/apha/141am/webprogram/Paper280610.html, last accessed 9 May 2014). NSDUH participation levels at 70% or better generally have been achieved, and confidential audio computer-assisted self-interviewing has been used for data gathering. Examples of standardized survey items are available online (http://oas.samhsa.gov/nsduh/methods.cfm, last accessed 9 May 2014). Various government reports provide basic estimates for the US population as a whole and subgroups judged to be especially important (e.g., overall malefemale differences of the type reported here). These online reports provide details on potential limitations, as well as a comparison with alternative field survey estimates such as those from the Monitoring the Future school surveys (e.g., http://www.samhsa.gov/data/ nsduh/2k11results/nsduhresults2011.htm) In our illustration, we describe male-female variation in incidents estimates for EMPPR across 51 state-level jurisdictions (including the District of Columbia). Here, estimation is based entirely on the Restricted-use Data Analysis System of the pooled 2002-2009 NSDUH survey data; the 10-year analysis weights for 2002-2011 data were not yet available at the time this project was being conducted. R-DAS involves use of correct analysis weights for the pooled incidence estimates, with corresponding 8-year poststratification adjustment factors built into the weights as described in Section 1.1. Here, R-DAS applies Taylor series linearization for estimation of variances under conditions of complex survey sampling. Once the correct pooled estimates are produced by R-DAS, various methods to summarize and compare incidence estimates may be employed, as illustrates below.

Confidence Intervals for Incidence Rates
In general epidemiology, an incidence rate for a disease in a specified population is defined as the ratio of the estimated number of newly incident cases arising during an interval of time to some measure of total or at-risk population size (or person-time experience). DAD readers interested in alternative specifications for incidence estimation may consult our Supplementary Material 1 Section A.1. R-DAS makes it possible to estimate the number of newly incident extra-medical drug users in a defined time interval, divided by the estimated size of the at-risk population under study, with exclusions and inclusions already specified in Section 1.2 (an example of generic R-DAS table specification as well as specific R-DAS variables used for this article's estimates can be found in Supplementary Material 2 Section A.3.) In many cases, the R-DAS-derived incidence estimates are small proportions (i.e., close to 0). For these small estimates, the standard and correctly implemented procedures for approximating 95% confidence intervals may yield a negative sign on the lower confidence bound. This negative value clearly falls outside a proportion's allowable range of (0, 1). In such cases, a logit transformation of estimated incidence rates provides mathematically acceptable results. The logit transformation of the incidence rate, p, is defined as where ln denotes a natural logarithm. If p̂ is an estimate of the incidence rate and L = ln p 1 − p is the corresponding logit transformation, then an approximate 1 -α confidence interval for the incidence rate, p, is given by where V ar(p) is the variance estimate of p̂ and z α/2 is (1 -α/2) percentile of the standard normal distribution (i.e., N(0, 1)). Via correspondence with the NSDUH R-DAS support group, we have learned that R-DAS uses the logit transformation to yield asymmetric confidence bounds that always fall within the (0, 1) interval.

Statistical Significance of Differences
Customarily, the observed difference between estimates is evaluated in terms of a test statistic, with a p-value to help quantify departures from a null hypothesis of no difference. A test of the hypothesis H 0: p 1 = p 2 versus a two-sided alternative H A: p 1 = ≠ p 2 (i.e., a test for a statistically significant difference between two incidence estimates) currently is not implemented in R-DAS. Nonetheless, the contrast can be made with ease "by hand". Under the null hypothesis, a test can be based on the statistic . (2) where Cov(·, ·) is a covariance term. Although R-DAS is not set up to provide Cov(p1, p2), we will work from an assumption that the covariance term is not appreciably different from zero, and generally can be ignored in approximation of the test statistic (Aldworth et al. (2011)). We return to this issue in our discussion section.

Control of Error When Multiple Tests
Are Performed-Whenever multiple inter-related tests of statistical significance are performed (e.g., testing for male-female differences for each of 51 state-level jurisdictions), one needs to be cautious in presentation of the results of the analysis due to the inflated probability of at least one false positive result (i.e., erroneous conclusion of a statistically significant non-null difference even when the null is true). Elevated probability of at least one false positive result has been called a "multiple-testing problem" -the more tests performed, the more false positive results to be expected, as shown in numerical examples provided elsewhere (e.g., Rice, 1988). Some type of correction for multiple testing may be needed. For example, in application of a commonly used Bonferroni correction, a hypothesis is rejected if the corresponding p-value is below the statistical threshold, α, divided by the number of tests. Interested readers can judge when and whether these corrections for multiple testing are needed, whether the Bonferroni correction is optimal, or whether alternatives to the Bonferroni approach might be useful (Cribbie, 2007).

Estimation of Suppressed Output
We have discovered that it often is the case that R-DAS estimates for extra-medical drug use can be produced for the entire population of interest (e.g., males and females as an overall population of interest), and for sub-populations that include relatively large unweighted numerators and denominators (e.g., 12-44 year olds). Nevertheless, drug use estimates for other sub-populations often are suppressed due to the small number confidentiality/privacy issue mentioned in Section 1.1 (e.g., when there is a small sample size for EMPPR users age 65 years and older). In this instance, when sub-populations can be defined to be mutually exclusive, it becomes possible for investigators to estimate the suppressed output "by hand" even when R-DAS will not produce the complementary tables.
To illustrate using an example of two mutually exclusive sub-populations, we let p̂ and N̂ denote the estimated percentage and the total weighted population size. Let p1 and N1 denote the estimated percentage and the sub-population weighted size for which the R-DAS delivers a table of estimates, and let p2 denote the estimated percentage from the subpopulation with weighted N2, for which the R-DAS output is suppressed. We can write p̂ as a weighted sum of p1 and p2, Then, where E is the expected value, V ar is the variance, and Cov is covariance. Therefore, In practice, an R-DAS analyst generally must presume a zero covariance term as discussed in Section 2.3. In Eq.(4), this zero covariance assumption may introduce some bias into the variance estimation process of the suppressed output. In Section 3.3 we show that this bias tends to be minimal in this particular NSDUH estimation context. See Supplementary Material 3 Section A.4 for the additional discussion of zero covariance term assumption.

Results
The results section has three parts: (3.1) Data Preparation, (3.2) Incidence Estimation for Males and Females, By State, and (3.3) State-by-State Variation in Age-Specific Incidence Estimates.

Data Preparation
The illustration of these statistical methods in this article involves estimation of incidence rates for extra-medical use of prescription-type pain relievers (mainly opioids) in the US, as an extension of prior reports (see Section 1.2). Via R-DAS, novel estimates for 2002 to 2009 are used to examine male-female differences by geographical location (state) and by age. At present, R-DAS does not provide a convenient way to transfer its output to a document or a spreadsheet. A current online recommendation (provided in Help with the Restricted-use Data Analysis System (R-DAS), 2013) is to copy and paste the output, but this output can 3 Supplementary material can be found by accessing the online version of this paper. Please see Appendix A for more information.
Vsevolozhskaya and Anthony Page 9 Drug Alcohol Depend. Author manuscript; available in PMC 2018 June 15.
not be read into a statistical package for an immediate analysis. An alternative might be to enter the output manually, but this alternative is time consuming and creates opening for inadvertent transcription errors. We produce a methodological solution to this problem, with elimination of transcription errors, via a python script (Python Software Foundation, 2013) for extraction of all relevant information from the R-DAS tabular output. Specifically, for each table, the script extracts all control variable names, as well as percentages, standard errors, and weighted sample sizes (except totals). We have shared a sample script in online document http://www.epi.msu.edu/vsevoloz/scripts/, also available by request to O.A.V. The script can be modified quite readily to accommodate extraction of different information (e.g., R-DAS-approximated confidence bounds). The authors welcome requests for help with such modifications.

Incidence Rate Estimation for Males & Females, By State
Combined 8-year R-DAS incidence estimates for EMPPR use, estimated for males and for females separately, in each of 51 state-level jurisdictions (hereinafter, "state") are summarized in Figures 1 and 2. R-DAS tables based on which Figures 1 and 2 were generated can be found http://www.epi.msu.edu/vsevoloz/scripts/RDAS/ rdas_incidents_gender.html or by accessing Supplementary Material 4 Section A.3. As noted in Section 2.1, within these figures, the 48 "lower" states and the District of Columbia are shaded to reflect the size of each incidence estimate. Since estimated standard errors for these estimates are quite small relative to the incidence estimates themselves, shading based just on the estimates provides an adequate representation of state-level variation in the risk of becoming a newly incident EMPPR user. Complete list of numerical estimates, including Alaska and Hawaii, is presented in Supplementary Material 5 Table 1 and shows very narrow 95% CI bounds.
As shown in Figures 1 and 2, the state of Utah has top-ranked empirical estimates for becoming a newly incident EMPPR user, and this is seen for males as well as for females. For males, the estimate is 1.5% (95% CI = 1.1, 2.2); females it is 1.8% (95% CI = 1.3, 2.5). It is not our intention to claim statistically significant results for Utah. Moreover, in the current application, a rigorous statistical analysis would be hard to perform due to multiplicity concerns (i.e., adjusting for 51 pair-wise comparisons), as well as correlation among incidence rate estimates in nearby states. However, we still think that it is epidemiologically noteworthy that Utah has come up as one of the top-ranked states for both males and females. Additionally, estimates for Wisconsin stood out with high estimated rates for both females (1.5%; CI = 1.0, 2.1) and males (1.4%; CI = 1.0, 1.8). For many other states, a substantial male-female difference in estimated incidence rates can be seen.
To see whether females or males are at greater risk of becoming newly incident users at the state level, we conducted a statistical significance test (level of significance α = 0.05%) for the difference in R-DAS incidence estimates based on Eq.
(2). The results of our findings (with no adjustment for multiple testing and zero covariance assumption) are summarized in Fig. 3; states with statistically significant male-female differences (p<0.05) are highlighted 4 Supplementary material can be found by accessing the online version of this paper. Please see Appendix A for more information. 5 Supplementary material can be found by accessing the online version of this paper. Please see Appendix A for more information. in color. Additionally, in Fig. 4, we provide 95% confidence intervals for the differences, as calculated by inverting the test statistic in Eq.
For all highlighted states, female incidence estimates were larger than corresponding male estimates. We note that if these state-specific estimated differences are regarded as exchangeable replications within a Bonferroni family of estimates, one might be led to the conclusion that females are not more likely than males to become newly incident extramedical users of these pain relievers; none of the state-level p-values passed the Bonferroni corrected threshold of 5% × 51.
Nonetheless, despite the fact that none of the state-level p-values passed the Bonferroni corrected thresholds, we think it should be argued that the Bonferroni correction is too stringent in this instance, and that our results support general pattern of female excess in the risk of becoming an extra-medical user of these drugs, with a difference of these sex-specific risk estimates falling above the null value of 0.0 for 31 states and with only 18 states having point estimates below the null value (i.e., 2:1 odds of female excess risk overall). See Supplementary Material 6 Table 1 for the complete list of female and male incidence estimates; our discussion provides evidence of an overall female excess based on Fisher combining test.
Our judgment that the Bonferroni correction is too stringent in the current analysis is based on an understanding of the Bonferroni correction as one that has been designed to control the probability of at least one false finding among independent tests within a family of exchangeable tests. Accordingly, whenever tests are interdependent, the Bonferroni correction becomes too stringent. Moreover, for the extreme opposite case of perfectly correlated tests, the correction is not required. We do not judge that these state-specific comparisons of males and females are truly independent. Rather, we judge that it is reasonable to assume that there is some spatial correlation in incidence estimates among nearby states (Barrios et al., 2012), given considerations such as interstate drug trafficking patterns and shared methods covariation from several potential sources. Examples of 'shared methods covariation' involve the possibility that NSDUH uses the same field staff in adjacent states; or that there is across-state sharing of respondent tendencies to disclose or not disclose illegal or sensitive aspects of behavior. Section A.2 of online Supplemental Material 7 provides mathematical clarification of spatial correlation issues in relation to possibly overly conservative Bonferroni correction.

State-by-State Variation in Age-Specific Incidence Estimates
It was not difficult to identify an illustration of potential substantive importance so as to foster the reader's understanding of an R-DAS 'error' message that is generated when tabular output has been suppressed due to a confidentiality restriction. For this illustration, we turned to an issue of whether prevention programs for EMPPR users might be needed among older persons, or whether these prevention programs might be focused solely upon young persons, a topic addressed by Colliver and colleagues (Colliver et al., 2006). Fairly 6 Supplementary material can be found by accessing the online version of this paper. Please see Appendix A for more information. 7 Supplementary material can be found by accessing the online version of this paper. Please see Appendix A for more information.

Vsevolozhskaya and Anthony Page 11
Drug Alcohol Depend. Author manuscript; available in PMC 2018 June 15. regular R-DAS system response error messages were generated when we tried to estimate state-specific incidence of EMPPR use among 45-106 year olds, which might lead an epidemiologist to think that there is good reason to believe that older US community residents are not becoming newly incident EMPPR users (R-DAS screenshots with our specifications for the numerator and the denominator for the age-stratified incidence estimation can be found in Supplementary Material 8 Section A.3). That is, the epidemiologist might reason that the size of the 'at-risk' study population of 46-106 year olds is quite massive, and therefore the confidentiality restriction must be due to the trivial size of each estimate's numerator (translated as "too few" newly incident users in that age range, so few that no R-DAS estimate can be produced).
In order to illustrate the utility of Eq. (4) as described in Section 2.4, we constructed R-DAS analyses to derive the estimates for older adults and their standard errors. That is, the approach here involved taking differences between the total population estimates and the estimates for the 12-44 year olds, as described in Section 2.4.
Figures 5 and 6 illustrate incidence estimates for the 12-44-year-olds and for the older adults. Additional numerical estimates for each state and age subgroup are provided in Supplementary Material 9 Table 2. Among 12-44-year-olds, incidence estimates are noticeably larger than corresponding estimates for older adults. In addition, there are states with non-zero estimated values for older adults. That is, the estimates show non-zero risk of becoming a newly incident EMPPR user beyond age 44 years.
A close look at Fig. 6 shows that Tennessee and Minnesota have a markedly deeper shade of color as compared to other states (i.e., larger incidence estimates), but we do not claim that these two states are significantly different from other states (i.e., at p<0.05). Unlike percentages in Figures 1-5, for which the corresponding standard errors are trivial, the estimated percentages in Fig. 6 are small, but have corresponding standard errors that are relatively large (traceable to the relatively small numerators for newly incident users among the older adults).
Overall, a close inspection of Figures 5 and 6 discloses that some newly incident cases of EMPPR use in the US are arising after age 44 years. Nevertheless, we note that the main purpose of this analysis was to illustrate that, in certain situations, the R-DAS output suppressed due to confidentiality concerns can be re-estimated "by hand" using Eq. (4) with 0.0 for its covariance term, Cov(p1, p2).
To assess an aspect of potential bias of Eq. (4) estimators, we compared our results to R-DAS output available for large states that did not require Eq (4). Some bias might be expected because a reduced number of significant digits displayed in R-DAS tables might lead to some rounding error. Table 1 shows the estimated and the observed R-DAS output for these larger states. It may be somewhat reassuring that the observed bias in the incidence estimates, defined as a difference between the observed and the estimated values, is negligible (on the order of a thousandth of 1%). The apparent bias among standard errors is 8 Supplementary material can be found by accessing the online version of this paper. Please see Appendix A for more information. 9 Supplementary material can be found by accessing the online version of this paper. Please see Appendix A for more information. larger but still less than tenth of a percent. This additional bias may be due to a non-zero covariance term involving the two proportions being compared. Nevertheless, based on the results in Table 1, the additional bias introduced by a nonzero covariance term might often turn out to be quite small. We conclude that the estimation procedure presented in Eq. (4) should work reasonably well in many contexts of estimation and inference from R-DAS output.

Discussion
The main focus of this article is methodological. In this article, we have shared some research ideas and approaches stimulated by our initial experiences in use of the SAMHSA R-DAS for estimation and inference. That is, once we had devised this set of research approaches, we had to decide whether to keep the approaches to ourselves, or to share them with others who might find them to be useful. We chose to share what we had learned and worked out, in hope that we can create a community of scientists whose use of R-DAS will encourage the SAMHSA tradition of innovation in data sharing. In time, we hope that SAMHSA will commission refinements of the R-DAS as it has done with its SDA system, including options for use of the generalized linear model with link functions when R-DAS is tapped for more advanced epidemiological analyses.
Before any discussion of new substantive findings from this work, several study limitations deserve attention. For example, in contrast to medical examiner toxicological reports on overdose deaths, the NSDUH estimates are based on self-report data. When the goal is to study newly incident EMPPR use in national-scale surveys, there seems to be no logistically feasible alternative to the self-report. Moreover, we acknowledge that there might be opioids-related sample attrition. For example, as for any individual who engages in extramedical use of an opioid pain reliever for the first time and who dies by overdose on that first occasion of use, this experience cannot be captured in any cross-sectional national scale survey of this type.
In Section 2.3 's coverage of assessing statistical significance of observed differences in R-DAS estimates, we drew attention to the covariance term of the standard equation. We noted an assumption -namely, that for a great many R-DAS difference tests, if the subpopulations are defined as mutually exclusive and truly are independent, this covariance term should be equal to zero or negligibly different from zero. Although we do not now have access to NSDUH data that would allow us to probe for violations of this assumption, the SAMHSA research team and the R-DAS Help Desk staff do have access to all of the data required to investigate this important topic in a systematic fashion. We hope that this article will stimulate them to provide us with the data required to probe for violations of the zero covariance assumption, or to complete and publish a report of their own analyses on this topic.
Notwithstanding limitations such as these, the article draws attention to (1) the observed state-level variations in the incidence of extra-medical use of prescription pain relievers, as well as some state-level male-female differences that are consistent with recently published estimates for young people in the US as a whole (Seedall and Anthony, 2013) and (2)  observation that in the United States there is evidence that newly incident EMPPR use is occurring after age 45. Given this article's methods focus, our discussion of these observations necessarily is constrained. Readers are referred to other sources on the topic of male-female differences and age-related differences (e.g., Simoni-Wastila et al., 2004;Kelly et al., 2008;Seedall andAnthony, 2013 Colliver et al., 2006) It might be argued that in this study's estimates the preponderance of evidence actually is balanced against the idea of a male-female difference. Among 51 jurisdictions for which we have produced estimates, there was a statistically significant (p < 0.05) female excess only in Nebraska, South Carolina, New Hampshire, Oklahoma, and New Mexico. Nonetheless, in order to substantiate our summary statements about the observed overall female excess, we employed the classical Fisher test (Fisher, 1925), by combining the one-sided p-values (i.e., H a: p F > p M) over 51 states. The resulting p-value for the overall national over-representation of females was very small (p-value = 3.49×10 −7 ). Thus, we conclude that during the years under study the women in the US were over-represented among newly incident EMPPR users. That is, rather than dismiss these state-level differences via a Bonferroni correction, future investigators can ask whether there might be other evidence consistent with the observed female excess risk of starting EMPPR use. As for the observations about newly incident EMPPR use in middle and later adulthood, we note that Colliver and colleagues (Colliver et al., 2006) studied prevalence of EMPPR use among US residents age 50 years and older, and made projections for the future, discovering that the number of prevalent users might well double by the year 2020. These prior findings were in the background of our decision to contrast the adult population to older adults (age 45-106 years) and our discovery that there are some older adults who are becoming EMPPR users in later life, even though original R-DAS tabular output would suggest that the numbers are too small for estimation at the state level. Accordingly, these estimates offer an epidemiological rationale to think through prevention of EMPPR use in each state's older adult population, although the incidence estimates for these older adults do tend to be smaller than incidence estimates for 12-44 year olds. Indeed, some survey statisticians might insist that at least a few of our state-level estimates for 45-106 year olds have minimal scientific utility, in that the relative size of the standard errors to the point estimates sometimes is less than optimal, as shown in the online appendix.
We note that all of the study analyses were performed using the NSDUH R-DAS. In our judgment, R-DAS creates an unprecedented opportunity to study the distribution (and to some extent the dynamics) of extra-medical drug use in the United States. As such, an important project goal was to clarify current R-DAS research approaches. In addition, we have outlined an approach that can be used to estimate the R-DAS output for a population subgroup, for which the tabulated results ordinarily are suppressed due to confidentiality restrictions. This approach includes no inherent violations of confidentiality or privacy protection principles, although the approach is one that requires each end-user investigator to be vigilant about the possibility of inadvertent invasions of privacy and inadvertent disclosure of drug use and other sensitive behaviors of individual NSDUH participants. That is, the approach we outline is no substitute for each investigator and research team evaluating whether individual-level participant confidentiality or privacy violations might occur when our approach is applied; this judgment must be made on a case-by-case basis. The R-DAS system response of "error" when there is a confidentiality restriction or suppression of output must serve as a warning to investigators that they are reaching boundary lines of a territory where protection of privacy is crucial; special care and inspection of output are required. Consultation with the R-DAS Help Desk may be needed.
At the end of the day, United States federal law dictates specific roles for government staff who must manage confidentiality of data from surveys like NSDUH, as well as penalties for mis-steps, but we believe it is an ethical responsibility of each individual end-user investigator to join in the federal effort to protect study participant privacy, ensuring that the study data being reported cannot be used to identify or impose risks on individual subjects or population subgroups. End-user investigators have two basic options, the first of which involves becoming more informed about disclosure risk assessment at a basic level, and applying suggested 'best practices'. For example, the "Rainbow Series" of reports published by the US National Center on Health Statistics now is available as in a searchable online document file (http://www.cdc.gov/nchs/products/series.htm, last accessed 11 May 2014), as are online tutorials and other tools created by the Federal Committee on Statistical Methodology of the US Office of Management and Budget (http://www.fcsm.gov/ committees/cdac/cdac.html, last accessed 9 May 2014). When these files are searched for terms such as 'disclosure analysis' or 'confidentiality,' the search results include many useful suggestions that any end-user investigator can apply before disseminating or publishing estimates from shared data sources. One suggestion is for an end-user to seek out examples of crosstabulations with table cells that contain information on very few respondents (often with a rule of thumb that a value should be suppressed when the cell count is below a threshold set variously at 4 < n < 21), much as the PUF creator might employ 'data coarsening' to preclude small cell counts. Another suggestion that requires no special statistical expertise is a 'k-anonymity' approach characterized as 'the most rigorous standard' for evaluation of whether a microdata set or analysis is 'safe'. That is, a data set should be regarded as "safe' if there are at least k (usually k = 3) records that are identical with respect to the set of key variables [that might be used to identify a study respondent indirectly]" (Steel, 2004).
Here, R-DAS end-users might face a small obstacle because R-DAS does not yield unweighted cell counts. Nonetheless, in experience to date, the R-DAS Help Desk staff members have been willing to confirm whether R-DAS cell counts fall below investigatorspecified thresholds and in some instances have provided the actual unweighted cell values. In addition, our research team has prepared formulae that can be used to derive an Finally, we hope that this article might be of use to DAD journal readers from other countries, who can begin to make data sharing part and parcel of their public health research activities, if they have not done so already. Of course, efforts to protect study participant privacy, always will be needed to ensure that the study data being reported cannot be used to identify individual subjects. Traditions of data sharing, with disclosure risk assessment and protection of confidentiality, have become a regular part of international symposia, roundtable discussions, and other activities of leading researchers in Australia, the US, and other countries. Interested DAD readers may wish to consult already published reports on this topic (e.g., Malin et al., 2013), including downloadable books made available free of charge (e.g., United States, 1985;United States, 2002;United States, 2004;United States, 2005;United States, 2007;United States, 2014).

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

•
The intention of our methodological focused paper is to invite more research teams to make use of the "Restricted-Use Data Analysis System" (R-DAS).

•
We clarify issues associated with approximation of R-DAS confidence intervals.

•
We providing a way to estimate confidence intervals when R-DAS output is suppressed due to confidentiality concerns.

•
We show how to make pairwise comparisons of estimates that R-DAS does not otherwise allow.    Results of significance testing for differences between males and females EMPPR incidence rates. States with statistically significant differences are highlighted in color.     Table 1 Bias assessment in re-estimation of the suppressed R-DAS output.