Published April 27, 2012 | Version v1
Dataset Open

Webis Patent Retrieval Corpus 2012 (Webis-PRA-12)

  • 1. Bauhaus-Universität Weimar

Description

The Webis Patent Retrieval Corpus 2012 (Webis-PRA-12) is a corpus for studying the impact of misspelled companies on patent retrieval.

The corpus contains 14,189 different company names extracted on the basis of 2,132,825 patents granted by the United States Patent and Trademark Office (USPTO) between 2001 and 2010.

Files

corpus-webis-pra-12.zip

Files (902.3 kB)

Name Size Download all
md5:490e583f4746c661796705b344c1afa9
902.3 kB Preview Download

Additional details

References

  • Benno Stein, Dennis Hoppe, and Tim Gollub. The Impact of Spelling Errors on Patent Search. In Walter Daelemans, editors, 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pages 570-579, April 2012. Association for Computational Linguistics. ISBN 978-1-937284-19-0