Open Science: Preregistrations & Tools

TIMESPAN is committed to transparency, reproducibility, and responsible sharing of knowledge. This page highlights two key elements of our open science approach:

Preregistered Protocols – outlining our study designs before data analysis begins
Open Tools & Code – enabling reuse and review of our models and methods

Preregistrated Protocols

To ensure research transparency and reduce bias, several TIMESPAN studies have been preregistered on the Open Science Framework (OSF) the public platform. You can read more about preregistration in this dedicated blog post. These registrations detail study objectives, methods, and analysis plans prior to data collection or analysis.

Study title	Date registered	Link
ADHD treatment discontinuation across the lifespan: a multi-national study	December 2022	https://osf.io/py4s7
How does ADHD/ADHD medication influence medication persistence for pharmacological treatment of hypertension	May 2024	https://osf.io/s93cq
How does ADHD/ADHD medication influence medication persistence for pharmacological treatment of type 2 diabetes	May 2024	https://osf.io/mu4nw
Clinical modifiers of ADHD treatment discontinuation across the lifespan: a multi-national study	June 2024	https://osf.io/q6eah
ADHD treatment discontinuation after a cardiovascular disease diagnosis: A multi-national study	April 2025	https://osf.io/jcqhw
ADHD/ADHD medication and risk of major adverse cardiac events and all-cause mortality in adults with type 2 diabetes: a multi-national study	February 2025	https://osf.io/fza9b
ADHD/ADHD medication and risk of major adverse cardiovascular events and all-cause mortality in adults initiating pharmacotherapy for hypertension without established cardiovascular disease	February 2025	http://osf.io/rhu6j

Open Tools and Code

In addition to sharing study protocols, TIMESPAN develops and publishes tools to foster reproducibility, particularly in the area of machine learning and pharmacoepidemiology.

WP6 Deep Learning Models

As part of Work Package 6, we created deep learning neural network (DLNN) models.

D6.1 – Data Structure DLNNs

The main objective of the D6.1 is to create innovative data structures DLNNs to predict cardiometabolic outcomes and treatment discontinuity using registry and clinical data. Our machine learning and deep learning framework for this objective is complete and the codes are freely available via Github repository.

All codes are written in Python, using Scikit-learn (Pedregosa et al., 2012), Keras (Charles, 2013) and Tensorflow libraries (Abadi et al., 2016; GoogleResearch, 2015).
Briefly, this repository contains the following files:

Read input tabular data (including generate training, validation and testing subsets; scaling features and binarize targets, i.e our outcomes of interests such as cardiometabolic diagnosis or events)
PCA feature reduction: a commonly used feature reduction and engineering method
Commonly used Scikit-learn models (including ensemble models) for tabular data
Scikit-learn model hyperparameter search (covering a wide range of models and hyperparameters, and all of the commonly used search algorithms)
Multilayer perceptron (MLP) model: A neural network model suitable for tabular data
Hyperopt search for MLP: Hyperparameter search algorithm for the MLP models using Hyperopt (http://hyperopt.github.io/hyperopt/)
Ensemble-MLP model: generate ensemble MLP model and stabilized predictions
Seq2Seq model with GRUs (Dey & Salem, 2017; Wu et al., 2016): a longitudinal neural network model that will use time-series data input and predict the future events or event serials (Y. Zhang-James, Hess, et al., 2021)
Feature importance analysis: a collection of various methods to examine and extract feature importance scores for various of models

For more information, you can access our public deliverable report on D6.1 here.

D6.2 – Genomic DLNNs

The main objective of the D6.2 is to create innovative Deep Learning Neural Network (DLNNs) using convolutional layers for genomic data. Our machine learning and deep learning framework for this objective is now complete and the codes are freely available via Github repository.

All codes are written in Python, using R, Scikit-learn (Pedregosa et al., 2012), Keras (Charles, 2013) and Tensorflow libraries (Abadi et al., 2016; GoogleResearch, 2015).

Briefly, this repository contains the following files:

Generation of context informed data matrix: Adding genomic annotations
Generation of context informed data matrix: Correlation finding
Generation of context informed data matrix: Creating Genomic input for CNN
Matched pairing
Genomic CNN including Keras-Tuner search for selecting optimal hyperparameters

For more information, you can access our public deliverable report on D6.2 here.

Why This Matters

By preregistering studies and publishing code and tools, we aim to:

Increase research transparency and reproducibility
Support collaboration and knowledge transfer
Enable other researchers and stakeholders to build upon TIMESPAN’s work