Amalgamating individual patient data for meta-analysis: An IT perspective

Article type
Authors
Hilken N1, Middleton L1, Champaneria R1, Daniels J1
1University of Birmingham, UK
Abstract
Background: Individual patient data (IPD) meta-analysis requires raw data from authors of primary studies, which can be provided in various formats. An IPD meta-analysis to evaluate the relative effectiveness of hysterectomy, endometrial ablative techniques and Mirena for heavy menstrual bleeding illustrates potential problems in collating disparate data sets.

Objectives: To describe the data management and synthesis aspects of an individual patient data meta-analysis from an IT perspective.

Methods: We requested IPD from 30 RCT authors and received responses from 17. Data were transformed into Third Normal Form (3NF) for each study, and translated to use homogeneous naming and coding conventions. A data import specification was created using Microsoft Excel that used VBA macros to create the necessary SQL scripts. The method provided a single authoritative source recording all data transformations. All import code was directly generated in a swift and repeatable manner from the specification itself.

Results: IPD from 17 studies was received over a 16 month period, as spreadsheets, databases, SAS, SPSS files or on paper, mostly lacking descriptive meta-data. The specification spreadsheet defined all coded lists for discrete variables, and mappings for each data set to a common coding convention. Automated code generation allowed collaborative effort from staff without programming skills on the import specification. VBA macros create SQL scripts for creating tables, relationships, views and data inserts, checks, and SAS files necessary to consume the data. Recommendations: An IPD meta-analysis is a substantial undertaking, particularly where non-standard ordinal data variables such as 'satisfaction’ are included. We recommend the creation of a master relational database, with automated data import to maintain data integrity and quality. Trialists should also consider further data use, apply accepted data standards, use relational databases and thoroughly document datasets, and ensure meta-data are stored along with data in the long term.