Bruegel datasets

Remerge: regression-based record linkage with an application to PATSTAT

We further extend the information content in PATSTAT by linking it to Amadeus, a large database of companies that includes financial information. Patent microdata is now linked to financial performance.

Last update: 28 September 2014

DOWNLOAD DATASET

Record linkage algorithms typically find matches by comparing records on the fields they share. However, PATSTAT shares very little information with company databases. We introduce REMERGE: a flexible, open-source algorithm that allows PATSTAT, the worldwide patent database, to be intelligently linked with company databases, without limiting the comparisons to the shared fields.

The results of this matching application can be used to improve research into the economics of innovation. The algorithm could also be adapted for similar problems. We provide a description of our algorithm, together with details on the coverage on a by-country and by-sector basis, performance measures, and hints for future research. We also show results from an additional application of REMERGE to the European Commission’s Tenders Electronic Daily database.

Click here to download the related. Working Paper.

Data Policy: This page provides a number of Bruegel datasets for public use. Users can freely use our data in its unchanged form or after any transformation for any purpose and can freely distribute it, provided that proper attribution is made to the source, but not in any way that suggests that Bruegel endorses the user or their use of the data.