An audit of the data structures of four international clinical trial registries

Article type
Authors
Ko H1, Hunter K1, Vu T1, Suasa R1, Smith E1, Zhang L1, Askie L1
1NHMRC Clinical Trials Centre, University of Sydney, Australia
Abstract
Background: Clinical Trial Registries (CTRs) are increasingly used by systematic reviewers to identify ongoing and completed clinical trials. CTRs have trial information on at least 20 unique data items (i.e. WHO Trial Registration Data Set (TRDS)). Whilst being compliant with the TRDS, the content between registries may be inconsistent as CTRs operate independently of each other. This presents difficulties when extracting data from multiple CTRs.
Objectives: To characterize the content of the data fields in the 20 TRDS items across different CTRs, and to identify data fields where extracting consistent information was complex.
Methods: Data fields from 4 CTRs (ANZCTR, ClinicalTrials.gov, ISRCTN, EuCTR) were audited in Dec 2014 as part of a larger project analysing country-specific clinical trial activity. The 20 TDRS items were assessed for their data quality, and how CTRs varied regarding the number of data items per data field, aggregated data items, and data item formats.
Results: All 4 CTRs collected the 20 TRDS items, but the content and number of data fields used to collect these items was not consistent across CTRs. Some TRDS items required single data fields only e.g. registration number. The more information-intensive items often had multiple, associated data items or were aggregated into single data items, e.g. intervention description, sponsor, and funder items differ across CTRs. Data items were presented in different formats e.g. one data item might have a combination of categorical, binary, or free text data. Some non-TDRS data items were in some CTRs but not others. Some CTRs have coding options for certain data items that are not directly translatable in other CTRs, e.g. intervention condition codes.
Conclusions: It is a complicated process for users to extract consistent data across multiple CTRs. Due to variations between CTRs (re: content, where information is kept, number of fields, types of data and how information is coded), accurate information extraction requires data field matching, formatting, cleaning, and recoding. Further standardization of CTR data is needed to improve the ease with which CTR data can be more fully utilised.