Dataset Open Access

Webis Patent Retrieval Corpus 2012 (Webis-PRA-12)

Gollub, Tim; Hoppe, Dennis; Stein, Benno

The Webis Patent Retrieval Corpus 2012 (Webis-PRA-12) is a corpus for studying the impact of misspelled companies on patent retrieval.

The corpus contains 14,189 different company names extracted on the basis of 2,132,825 patents granted by the United States Patent and Trademark Office (USPTO) between 2001 and 2010.

Files (902.3 kB)
Name Size
corpus-webis-pra-12.zip
md5:490e583f4746c661796705b344c1afa9
902.3 kB Download
  • Benno Stein, Dennis Hoppe, and Tim Gollub. The Impact of Spelling Errors on Patent Search. In Walter Daelemans, editors, 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pages 570-579, April 2012. Association for Computational Linguistics. ISBN 978-1-937284-19-0

17
0
views
downloads
All versions This version
Views 1717
Downloads 00
Data volume 0 Bytes0 Bytes
Unique views 1616
Unique downloads 00

Share

Cite as