By Andrew Yuen, PhD student at UCL:
Kenneth Man and Anton Pottegård have recently published a paper outlining 10 practical tips for conducting multidatabase studies across countries. Bringing real-world data together can uncover insights that a single source cannot, such as rare outcomes, subgroup effects, and the influence of local clinical practices. Yet, this approach is complex, as differences in coding systems, reimbursement rules, data lags, and clinical cultures can quickly turn a clear protocol into a patchwork. Drawing on hands-on experience from multi-country projects, the paper provides a field-tested playbook to keep studies coordinated, credible, and comparable. “If you want to go fast, go alone. If you want to go far, go together.”
10 practical considerations
- Build a metadata table: Create a reusable, version-controlled overview of each dataset: documenting coverage period, inclusion/exclusion rules, care settings, coding systems, available domains (drugs, diagnoses, laboratory results), validation notes, and any study-specific caveats. Add short notes on the local healthcare context—guidelines, reimbursement quirks, and clinical pathways that could affect measurement.
- Create a master mapping table: Document the codes for exposures, outcomes, and covariates across systems (e.g., ICD and Read/SNOMED; ATC and BNF) and keep a changelog. Where one-to-one code maps do not exist, write concept algorithms and involve local clinicians. Publish these definitions with your paper to maximise transparency and reuse.
- Conduct data checks: Ask sites for crude counts meeting key criteria, plus follow-up distributions (mean, median, Interquartile range). Crude counts and follow-up distributions can reveal feasibility issues early in the study design phase. Early feasibility often reveals design breakers and saves time later.
- Understand exposure patterns: Once data checks are complete, the next step is to analyse exposure patterns, such as the number of exposed individuals, treatment switching, and treatment duration. Stratified counts across sites (e.g., by age or comorbidities) help ensure representativeness, while examining initiation, discontinuation, and persistence provides context for interpreting study results and understanding regional healthcare practices.
- Fully agreed protocol with data harmonisation plan: A well-defined, commonly agreed protocol is critical in multidatabase pharmacoepidemiology studies to ensure consistent study design and data harmonisation across sites. Beyond technical guidance, it establishes a shared understanding of study conduct and regional differences. Protocols should include detailed design, harmonisation, and analysis plans, ideally following templates like HARPER and registered in public platforms such as the EMA or OSF registries.
- Standardised flowcharts: Flowcharts are vital in multi-national/database studies to clearly show participant inclusion and exclusion, as cohort characteristics can vary widely across sites. Each data partner should provide a standardised flowchart that details exclusions at every stage, including reasons such as insufficient follow-up or unmet inclusion criteria.
- Design a “Table 1” that travels: Descriptive cohort data (often presented as Table 1) provide an overview of participant characteristics, enabling comparison across databases and highlighting potential sources of bias. Tables should show both counts and proportions for each variable in separate columns to support cross-site comparison and facilitate pooled analyses.
- Model diagnostics: Reviewing statistical model diagnostics, such as regression results and propensity score (PS) checks, is crucial in pharmacoepidemiology. Data partners should provide outputs, including covariate balance before and after PS adjustment, PS distribution plots, and regression outputs to ensure analyses are valid and confounding is well controlled across databases.
- Request failure plots (e.g., Kaplan-Meier or Cumulative Incidence Curves): Failure plots, like Kaplan-Meier curves, are key for visualising time-to-event data in pharmacoepidemiology. Each database should provide these plots to check survival patterns and cumulative incidences, helping detect data issues or unexpected trends. Consistent patterns across databases strengthen confidence in pooled analyses.
- Quality control plan: A robust quality control plan is essential to ensure accurate transcription and consistent reporting of results across sites. Standardised reporting formats, site confirmation of data transfers, and collaborative review cycles help prevent discrepancies and ensure shared interpretation of findings. Transparency can be further strengthened by sharing analytical code and following FAIR principles to enhance reproducibility and cross-disciplinary learning.
Why are they important
Multinational databases and pharmacoepidemiology studies offer unique opportunities to evaluate drug safety and effectiveness across diverse populations. Still, they face challenges such as varied coding systems, diverse healthcare settings, and administrative hurdles. Success depends on careful planning, strong communication, rigorous harmonisation, and systematic quality checks to avoid pitfalls and ensure consistent implementation across sites. These studies are important because, despite their complexity, they can generate robust, generalisable evidence that informs safer prescribing and better healthcare practices worldwide.
Go far, together.
Link to original article:
https://onlinelibrary.wiley.com/doi/10.1002/pds.70203
Back to blog