Published September 29, 2007 | Version 6976
Journal article Open

The Influence of Preprocessing Parameters on Text Categorization

Description

Text categorization (the assignment of texts in natural language into predefined categories) is an important and extensively studied problem in Machine Learning. Currently, popular techniques developed to deal with this task include many preprocessing and learning algorithms, many of which in turn require tuning nontrivial internal parameters. Although partial studies are available, many authors fail to report values of the parameters they use in their experiments, or reasons why these values were used instead of others. The goal of this work then is to create a more thorough comparison of preprocessing parameters and their mutual influence, and report interesting observations and results.

Files

6976.pdf

Files (1.4 MB)

Name Size Download all
md5:f250a1ba1331854e71f84c29a6911c0d
1.4 MB Preview Download