Dataset Open Access
The Webis Patent Retrieval Corpus 2012 (Webis-PRA-12) is a corpus for studying the impact of misspelled companies on patent retrieval.
The corpus contains 14,189 different company names extracted on the basis of 2,132,825 patents granted by the United States Patent and Trademark Office (USPTO) between 2001 and 2010.
Benno Stein, Dennis Hoppe, and Tim Gollub. The Impact of Spelling Errors on Patent Search. In Walter Daelemans, editors, 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pages 570-579, April 2012. Association for Computational Linguistics. ISBN 978-1-937284-19-0