Psychometric goals of pilot studies

1) Developmental growth within and across sites. The goal is to select measures that show strong evidence of developmental change both within and across individuals and for which we observe neither floor nor ceiling effects across the full age range. As a minimum threshold, we suggest correlations between raw scores and age-in-months of r > .3.

2) Within-test reliability, computing Cronbach’s alpha for caregiver report questionnaires (or omega for measures with weighted scoring). All caregiver report measures should achieve alpha > .7 within site for at least two of the three sites. If one site shows low alpha, we will evaluate potential adaptation or translation issues.

3) Test-retest reliability. The Germany test site will administer a subset of measures a second time at a delay between one week and one month, All measures should achieve test-retest reliability > .60 for inclusion in the final battery.

4) Construct validity. For the behavioral assessment constructs for which we have three or more measures, we will fit a within-site latent-variable model. We will assess loadings on each task as well as error variance.

5) Measurement invariance. We will assess the degree of invariance in the factor structures of our instruments’ loadings across age and culture both within-construct (across different measures of the same construct). Where appropriate we will also examine within-measure invariance (across different items in the same measure). We do not establish strict cutoffs for invariance but we note that different degrees of invariance will lead to different constraints on the interpretation of the models proposed in (2) LEVANTE Scientific Aims.

6) External validity. Colombia and Canada test sites will likely have access to school grades for individual children, and in the case of Canada may have access to standardized tests. We will look for correlations between latent constructs derived from task clusters and site-specific corresponding measures at r > .6.

7) Test bias. Within those measures that make use of multiple items with different content, we will explore culture- and gender-specific item effects. We will use differential item function (DIF) models to ask about the presence of items that do not appear to function appropriately across these grouping factors.