A multifile search of five bibliographic databases for stroke trials

Article type
Thomas B, Gubitz G, Mclnnes A, Krabshuis J, Counsell C
Introduction/Objective: Electronic searching of the many available bibliographic databases is an important means of trial identification, offering quick access to many thousands of references. However, such searching is time consuming and can be expensive: complex database-specific searches need to be developed, many irrelevant references have to be screened, and the costs of accessing and downloading need to be considered. The absolute and relative yields of trials in each database need to be established if Review Groups are to use their limited resources efficiently. We performed a complete retrospective search of five major databases to investigate the numbers of stroke trials identified in each, the degree of duplication across each database, and the numbers of trials on our existing specialised register that were not identified in any of the databases.

Methods: Separate detailed search strategies were developed for MEDLINE, EMBASE, BIOSIS, DERWENT Drug File and SCISEARCH based on index and free text terms. To limit the cost, we conducted searches of available years of MEDLINE, EMBASE, and BIOSIS free of charge through the Edinburgh University network. We screened the title and abstract of each reference retrieved, and obtained a paper copy of possibly relevant stroke trials. We performed a multifile search of the remaining years of these databases along with DERWENT Drug File and SCISEARCH using a commercial server (STN). This allowed de-duplication of up to 30,000 references simultaneously across the different databases and exclusion of all the references identified in our initial searches of MEDLINE, EMBASE, and BIOSIS. We screened the free fields (title, keywords) of each reference on STN, purchased the full downloaded record of any possibly relevant stroke trials, and obtained a paper copy of those with relevant abstracts.

Results: Our free searches of MEDLINE, EMBASE, and BIOSIS generated 61,646 possibly relevant references but we expected considerable duplication. Given the variation in reference formats across databases and time periods, we developed a duplicate detection algorithm which resulted in 47,312 unique references. We estimate that less than 10% will describe relevant stroke trials. The multifile search on STN generated 10,273 additional unique references of which 1326 were downloaded. We will present numbers of relevant references identified in each database and degree of duplication.

Discussion: Considerable cost and effort are required to systematically search multiple databases for stroke trials. Calculating the yield of each database will allow a more focussed approach in the future. Commercial searching with the ability to automatically re-run searches and de-duplicate different databases and previous searches may be a valuable option for prospective searching.