Dataset Open Access

Webis Patent Retrieval Corpus 2012 (Webis-PRA-12)

Gollub, Tim; Hoppe, Dennis; Stein, Benno

The Webis Patent Retrieval Corpus 2012 (Webis-PRA-12) is a corpus for studying the impact of misspelled companies on patent retrieval.

The corpus contains 14,189 different company names extracted on the basis of 2,132,825 patents granted by the United States Patent and Trademark Office (USPTO) between 2001 and 2010.

Files (902.3 kB)
Name Size
902.3 kB Download
  • Benno Stein, Dennis Hoppe, and Tim Gollub. The Impact of Spelling Errors on Patent Search. In Walter Daelemans, editors, 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pages 570-579, April 2012. Association for Computational Linguistics. ISBN 978-1-937284-19-0

All versions This version
Views 260260
Downloads 1010
Data volume 9.0 MB9.0 MB
Unique views 255255
Unique downloads 1010


Cite as