Open-access LEVANTE infrastructure

The key goals regarding LEVANTE infrastructure are twofold: 1) the development of the open-access shared assessment measures to be used, and 2) the LEVANTE Data Repository (LDR), an open-access data repository that will include not only data but much more, from analysis pipelines to resources for ethical approvals. The key goals are detailed below:

Goal 1: Measures

Create and disseminate an open-source, permissively-licensed set of developmental measures appropriate for the assessment of learning variability in children ages 2-12, including direct child assessment  and child and primary caregiver-report measures with good psychometric properties and evidence for measurement validity, internationalized across multiple  languages. 

  1. Curate, implement, and internationalize measures in an openly accessible web platform that is accessible to families and assessors through both tablets and web browsers.
  2. Estimate psychometric properties of tasks across populations via pilot testing, with a focus on identifying a core set of measures that show good psychometric properties, allowing comparisons across cultures and ages.

Goal 2: Data sharing

Create the architecture for disseminating a large, well-curated, open dataset of longitudinal data from children ages 2-12 using these measures to characterize developmental variability within individuals, within groups, and across contexts. 

  1. Establish repository architecture and governance documents, including a framework for ethical approval, privacy, and data sharing that allows for wide reuse of the resulting dataset.
  2. Create import pipelines to ingest and validate data to the repository so that data flow directly from the measurement platform to an open repository where they can be accessed and reused.

Description

Data collection using ROAD

Partner sites (i.e. Labs) will design and manage studies by selecting tasks in the Rapid Online Assessment Dashboard (ROAD) experiment platform. Individual ROAD-hosted tasks will be implemented in jsPsych, a standard browser-based experiment design package. This framework allows the flexible design of internationalized tasks that run in both a browser and a tablet format. 

The ROAD offers the Dashboard, a research management interface that will allow labs to customize the packaging of the LEVANTE core measure set into groups of assessments for distribution to participants, caregivers, and/or school personnel. These groups of assessments can be shared with families via links (e.g., a link to a set of parent questionnaires or to direct assessments for a child) or administered directly to a participant on a tablet or computer in an assessment center or classroom. Data from these tasks will be stored in the ROAD database (DB), a Google Cloud-hosted Firestore database. 

Diagram of data collection and storage in the LEVANTE Framework.

Privacy controls 

The ROAD interface will never receive identifiable or quasi-identifiable data from participants. The only demographic data used in the ROAD will be a rounded age value with insufficient precision for re-identifiability. A rounded age value is necessary to ensure proper test administration and scoring.

In order to obviate the transfer of private or otherwise sensitive information, labs will collect some quasi-identifiable information locally, including demographic data, data about the research site, and geographic data. Further, labs will also enter addresses into an independently-provided geographic data dashboard to receive geographic data (e.g., population density) from relevant GIS databases. 

These data will be merged into a single spreadsheet and processed through the MinBlur algorithm and all geographic features will be downsampled to the level of granularity appropriate for that location (e.g., first three digits of zip code in the US). This processing will be accomplished via a script that is run locally on the participating site’s own computing resources, with no access to these data at any time by the LEVANTE data coordinating center. This process will ensure that this dataset includes no quasi-identifiers at the level of k-identifiability (K=5) for the relevant population. 

This dataset – which, post-processing, will contain no identifiers or quasi-identifiers – will be uploaded directly to the data validation interface, rather than through the ROAD. 

Data transfer and validation

A suite of data processing pipelines will transfer task data into the LEVANTE Data Repository (LDR). These pipelines will be set up to extract data from ROAD and from any external task platforms, perform data validation and manipulation, and aggregate the data into the LDR. Data validation will include ensuring that each dataset adheres to the predetermined schema of variable names and data types, and excluding data points with missing or disallowed values. Additionally, this data transfer process will include both creating a LEVANTE Unique Identifier (LUID) for each participant and confirming that any other identifiers or identifiable information have been removed. 

LEVANTE Data Repository (LDR) and Data Access

The LEVANTE Data Repository will be set up using the Redivis data management platform. Access to LDR will be enabled using an Application Programming Interface (API), which will wrap the general Redivis API to allow fine-grained access controls to different datasets. Token authentication will be used to provide users with different levels of data access (see LEVANTE Data Repository: contribution and access for more details on this).