Template-Type: ReDIF-Paper 1.0 Series: Tinbergen Institute Discussion Papers Creation-Date: 2023-10-12 Number: 23-055/VIII Author-Name: Leon Bremer Author-Workplace-Name: Vrije Universiteit Amsterdam Title: Fuzzy firm name matching: Merging Amadeus firm data to PATSTAT Abstract: When merging firms across large databases in the absence of common identifiers, text algorithms can help. I propose a high-performance fuzzy firm name matching algorithm that uses existing computational methods and works even under hardware restrictions. The algorithm consists of four steps, namely (1) cleaning, (2) similarity scoring, (3) a decision rule based on supervised machine learning, and (4) group identification using community detection. The algorithm is applied to merging firms in the Amadeus Financials and Subsidiaries databases, containing firm-level business and ownership information, to applicants in PATSTAT, a worldwide patent database. For the application the algorithm vastly outperforms an exact string match by increasing the number of matched firms in the Amadeus Financials (Subsidiaries) database with 116% (160%). 53% (74%) of this improvement is due to cleaning, and another 41% (50%) improvement is due to similarity matching. 18.1% of all patent applications since 1950 are matched to firms in the Amadeus databases, compared to 2.6% for an exact name match. Classification-JEL: C81, C88, O34 Keywords: Fuzzy name matching, supervised machine learning, name disambiguation, patents File-URL: https://papers.tinbergen.nl/23055.pdf File-Format: application/pdf File-Size: 750.433 bytes Handle: RePEc:tin:wpaper:20230055