Article type
Year
Abstract
Background:
The HarmoniSR project was set up in 2013 by a group of Cochrane information specialists to generate guidance about standardisation of data through an open, transparent and inclusive consensus process. In the first instance, HarmoniSR was concerned with standardising records in the Cochrane Register of Studies (CRS). A separate project was undertaken by Metaxis to identify errors in the Cochrane Central Register of Controlled Trials (CENTRAL) so that they could be corrected automatically or passed to Cochrane Review Groups for manual correction. As part of that project the values of all fields in CENTRAL records were extracted and analysed. This gave a rich data source for the HarmoniSR project to use to formulate its initial recommendations.
Objectives:
To report the key findings of the CENTRAL Cleanup project and the HarmoniSR work to date.
Methods:
Values in all published fields in all CENTRAL records on 23 January 2014 were extracted and distilled via pattern matching and textual analysis into exemplars of different types of variation within each field. Those exemplars were then used a) to re-populate the CENTRAL fields with corrected values where possible, and b) to inform priority setting of the work to be covered by the HarmoniSR project.
Results:
A total of 453,380 records in CENTRAL were changed as a result of the CENTRAL Cleanup project. The exemplar patterns were analysed and recommendations made for standardisation of each published field in CENTRAL going forward.
Conclusions:
The CENTRAL Cleanup project found a large number of systematic differences among field data in CENTRAL records. Some of those differences were errors (i.e. the values were incorrect), some reflected the different representational styles of different source databases, and some represented differences in coding used by Trials Search Co-ordinators. The exemplars of the different classes of variation allowed efficient programming of global clean up routines and facilitated the development of rules and guidelines for future data entry.
The HarmoniSR project was set up in 2013 by a group of Cochrane information specialists to generate guidance about standardisation of data through an open, transparent and inclusive consensus process. In the first instance, HarmoniSR was concerned with standardising records in the Cochrane Register of Studies (CRS). A separate project was undertaken by Metaxis to identify errors in the Cochrane Central Register of Controlled Trials (CENTRAL) so that they could be corrected automatically or passed to Cochrane Review Groups for manual correction. As part of that project the values of all fields in CENTRAL records were extracted and analysed. This gave a rich data source for the HarmoniSR project to use to formulate its initial recommendations.
Objectives:
To report the key findings of the CENTRAL Cleanup project and the HarmoniSR work to date.
Methods:
Values in all published fields in all CENTRAL records on 23 January 2014 were extracted and distilled via pattern matching and textual analysis into exemplars of different types of variation within each field. Those exemplars were then used a) to re-populate the CENTRAL fields with corrected values where possible, and b) to inform priority setting of the work to be covered by the HarmoniSR project.
Results:
A total of 453,380 records in CENTRAL were changed as a result of the CENTRAL Cleanup project. The exemplar patterns were analysed and recommendations made for standardisation of each published field in CENTRAL going forward.
Conclusions:
The CENTRAL Cleanup project found a large number of systematic differences among field data in CENTRAL records. Some of those differences were errors (i.e. the values were incorrect), some reflected the different representational styles of different source databases, and some represented differences in coding used by Trials Search Co-ordinators. The exemplars of the different classes of variation allowed efficient programming of global clean up routines and facilitated the development of rules and guidelines for future data entry.