Published May 10, 2021 | Version v1.2.1
Software Open

WikiVector: Tools for encoding Wikipedia articles as vectors

  • 1. The University of Texas at Austin
  • 2. University of California, Berkeley
  • 3. University of California, Los Angeles

Description

This version switches to a modern build system and adds a utility for stripping remaining tags that may be left by WikiExtractor for older Wikipedia dumps.

  • The build system is defined by the pyproject.toml file.
  • Setup options are now mostly set by setup.cfg.
  • The wiki_remove_tags script can be used to strip out some common remaining tags from extracted Wikipedia text.

Files

mortonne/wikivector-v1.2.1.zip

Files (24.0 kB)

Name Size Download all
md5:c20d3bc47b92a65fb1c4e288f2556176
24.0 kB Preview Download

Additional details

Related works