00000nmm##2200000uu#4500 4477881 doi 10.5281/zenodo.4477881 oai:zenodo.org:4477881 user-galaxy-training Sentiment analysis in Galaxy with IMDB movie review dataset Kaivan Kamali Penn State University info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 spdx IMDB Sentiment Analysis Movie reviews IMDB movie review sentiment classification dataset (Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011)). For more information please refer to: https://ai.stanford.edu/~amaas/data/sentiment/ The IMDB dataset was modified as follows to prepare it for use in a Galaxy Training Tutorial (https://training.galaxyproject.org/): The top 50 words are excluded (mostly stop words). Included the next 10,000 top words. Reviews are limited to 500 words max (Longer reviews trimmed and shorter reviews are padded). 25,000 reviews are used for training and testing each. Files are in tsv (tab separated value) format to be consumed by Galaxy (www.usegalaxy.org).  Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011). For more information please refer to: https://ai.stanford.edu/~amaas/data/sentiment/ eng Zenodo 2021-01-28 user-galaxy-training info:eu-repo/semantics/other 20220804083747.0 225000 md5:f9e351def7454dddd0def2bfcd7c6955 https://zenodo.org/records/4477881/files/y_test.tsv 225000 md5:12703b87d2ad9f12b0639e97a2de4c9a https://zenodo.org/records/4477881/files/y_train.tsv 118634408 md5:694514b7585fe9ca816b433f2611ae41 https://zenodo.org/records/4477881/files/X_test.tsv 118817859 md5:98ea1599b7942709af37ca5a6aff07d4 https://zenodo.org/records/4477881/files/X_train.tsv open 10.5281/zenodo.4477880 isVersionOf doi