Investigating participation in population-based cohort studies using paradata

Langeheine, Malte

doi:10.26092/elib/95

Zitierlink DOI

10.26092/elib/95

Investigating participation in population-based cohort studies using paradata

Veröffentlichungsdatum

2020-05-20

Autoren

Betreuer

Gutachter

Zusammenfassung

This thesis consists of seven main chapters. In the introduction (Chapter 1) the topics under study are conceptualized within a general framework, the Total Study Error, that classifies errors, linking the error sources to the steps of study design, conduct (data collection) and the estimation of prevalences, incidences or associations. This thesis focuses on non-participation (also called nonresponse or at follow-ups attrition) in population-based cohort studies at baseline or follow-up and investigates mainly topics related to the Total Study Error component nonresponse error (Chapters 2 to 5) and to a minor extent to the component related to measurement error (Chapter 6) mainly including paradata (i.e., information about the process of data collection) (Chapters 3, 5, and 6) in the analysis. While Chapters 2, 3, 5, and 6 are published peer-reviewed scientific papers, Chapter 4 is currently under review. Non-participation is not only a phenomenon of the past, but is a source of error that future cohort studies will have to deal with. Hence, it is imperative to address nonresponse.
Attrition may lead to bias in epidemiological cohorts, since participants who are healthier and have a higher social position are less likely to drop out. In Chapter 2, we investigated possible selection effects regarding key exposures and outcomes in the IDEFICS/I.Family study, a large European cohort on the etiology of overweight, obesity and related disorders during childhood and adulthood. In particular, we investigated associations of attrition with sociodemographic variables, weight status, and study compliance and assessed attrition across time regarding children’s weight status and variations of attrition across participating countries. We investigated selection effects with regard to social position, adherence to key messages concerning a healthy lifestyle, and children’s weight status. Attrition was associated with a higher weight status of children, lower children’s study compliance, older age, lower parental education, and parent’s migration background, consistent across time and participating countries. Although overweight (odds ratio 1.17, 99% confidence interval 1.05-1.29) or obese children (odds ratio 1.18, 99% confidence interval 1.03-1.36) were more prone to drop-out, attrition only seemed to slightly distort the distribution of children’s BMI at the upper tail. Restricting the sample to subgroups with different attrition characteristics only marginally affected exposure-outcome associations.
Declining response proportions in population-based studies are often countered by extended recruitment efforts at baseline that may, however, result in higher attrition in a subsequent follow-up. In Chapter 3, we analyzed the effect of extended recruitment efforts on attrition at the first follow-up of the child cohort IDEFICS. We used paradata from the German IDEFICS cohort to quantify recruitment effort and classify respondents as completing the recruitment early vs. late for baseline and follow-up separately. Individuals who were late respondents at baseline and early respondents at the follow-up had a higher chance of attrition (odds ratio 1.65, 95% confidence interval 1.19, 2.28) as compared to other groups. An investigation of reasons for non-participation revealed that members of this group were more likely to be not reachable by phone.
Although missing data are a major concern in epidemiology, missing data may not necessarily result in biased estimates. Analyzing only the individuals with complete data on the outcome, the exposure, and all explanatory variables, is a very simple and therefore popular strategy of handling missing data known as complete-case analysis. In a regression analysis, complete-case analysis is consistent in general when missingness is unrelated to the outcome given the explanatory variables included in the regression model. However, it appears not widely appreciated that this validity of complete-case analysis critically hinges on correct specifications of the analysis model and a misspecified model might re-introduce bias. In Chapter 4, we illustrated with a simulation how different modeling choices can affect our conclusions even when a complete-case analysis is in principle valid. We based our study on empirical data from the IDEFICS study and simulate only the missingness mechanism assuming different association strengths and different frequencies of missingness. In each scenario, we investigated the performance of three different analysis models using complete-case analysis as well as multiple imputation and inverse probability weighting as methods to correct for missing data. Our results suggest that model misspecification can lead to considerable bias when data contain missing values, even in a scenario where an ideal complete-case analysis is known to be consistent. This bias equally affects multiple imputation, and to a lesser extent, at the cost of precision, inverse probability weighting which requires a correct weighting model. In our example, basic model diagnostics were sufficient to alert us to the misspecification of the simple analysis model with regard to the functional form of the exposure; this was detectable even for the most extreme missingness mechanisms.
Another aspect of this thesis was whether we could enhance the response to study invitations (Chapter 5). We therefore conducted a trial embedded within the German National Cohort comparing the responses to study invitations sent in recycled envelopes of grey color vs. envelopes of white color. We analyzed paradata for reactions to the invitation letters by potential subjects, the duration between mailing date of the invitation and active responses, and study participation. Grey envelopes only slightly increased the chance of active responses (odds ratio 1.16, 95% confidence interval: 0.83, 1.62) to the invitation letter. Potential study subjects with German nationality (odds ratio 3.75, 95% confidence interval: 2.07, 7.66) and age groups above 50 years (50-59: odds ratio 1.78, 95% confidence interval: 1.19, 2.69; 60-69: odds ratio 2.25, 95% confidence interval: 1.48, 3.43) were more likely to actively respond to the invitation letter. The duration between mailing date of the invitation and active response was not associated with envelope color, sex, nationality, or age.
In Chapter 6, we touched upon a kind of participation mostly regarded as undesired. We analyzed factors associated with the presence of an intimate partner during face-to-face interviews using data from the first wave of the German Family Panel (pairfam). Although the intimate partner is most likely to be the third person present during the interview, an examination of the association of the relationship quality and the presence of the intimate partner is lacking in the literature. Our descriptive analysis revealed that an intimate partner was present in every seventh interview. The opportunity structure, such as the couple’s living arrangements or their employment status, had the greatest influence on the presence of both female and male partners while aspects of the relationship quality were to a minor extent associated with the partner’s presence.
Chapter 7 summarizes the main findings of this thesis in light of previous literature and discusses methodological considerations related to the scientific articles presented in this thesis. In conclusion, paradata are valuable information for the investigation of nonresponse and attrition in cases where they capture all the information required to answer a given research question. Paradata are often considered only as a 'by-product', but provide considerable scientific benefit and should be already defined in the planning phase of a study.

Schlagwörter

Nonresponse

;

Paradata

;

Cohort study

Institution

Universität Bremen

Fachbereich

Fachbereich 03: Mathematik/Informatik (FB 03)

Dokumenttyp

Dissertation

Zweitveröffentlichung

Nein

Lizenz

http://creativecommons.org/licenses/by/3.0/de/

Sprache

Englisch

Dateien

Name

Druckbare RightsLink-Lizenz_paper1.pdf

Size

88.6 KB

Format

Adobe PDF

Checksum

(MD5):944d5629dc06c8e461b5b91c5afb255f

Name

Rahmenpapier_langeheine_200212_eingereicht_missing_data_überarbeitet-.pdf

Size

9.75 MB

Format

Adobe PDF

Checksum

(MD5):0fd8d830f77ec1e8170f80b71e1275ac