Arabica: A Python package for exploratory analysis of text data
- 1. Zeppelin University in Friedrichshafen, Germany
- 2. Brno University of Technology, Department of Radio Electronics, Czech Republic
Contributors
Researchers:
- 1. Zeppelin University in Friedrichshafen, Germany
- 2. Brno University of Technology, Department of Radio Electronics, Czech Republic
Description
Research meta-data is typically recorded as a time series with dimensions of cross-sections (e.g., article title, journal, volume, issue, author’s names, and affiliations) and time (e.g., publication date). Meta-datasets provide valuable insights into the research trends in a particular field of science. Meta-analysis (a group of methods to analyze research meta-data) currently does not implement text analytics in either programming language. This package aims to fill that need. Arabica offers descriptive analytics, visualization, sentiment classification, and structural break analysis for exploratory analysis of research meta-datasets in easy-to-use Python implementation.
The package operates on three main modules: (1) descriptive and time-series n-gram analysis provides a frequency summarization of the key topics in the meta-dataset, (2) visualization module displays key-term frequencies in a heatmap, line plot, and word cloud, (3) sentiment and structural breakpoint analysis evaluates sentiment from research article titles and identifies turning points in the sentiment of published research. It uses VADER [@Hutto:2014] and FinVADER [@finvader:2023], the updated model with financial lexicons, to classify sentiment. Clustering-based Fisher-Jenks algorithm [@Jenks:1977] finds break points in the data.
The package has more general use for exploratory analysis of time-series text datasets, mainly in social sciences. In business economics, it improves customer satisfaction measurement through product reviews analysis. In politology and behavioral economics, it enables detailed text mining of social media interactions. Similarly, in finance, it simplifies financial sentiment analysis of news headlines.
Files
dependency_links.txt
Files
(87.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:726c1d36a00344e805d4ca89f5ddd105
|
105 Bytes | Download |
|
md5:fb546d6bd1cae455bba6041db784be40
|
12.0 kB | Download |
|
md5:dc47ca8dc719a3702b68f6e3666926c5
|
31.6 kB | Download |
|
md5:00aa3360a20881e44cbec077b742db26
|
1.7 kB | Download |
|
md5:58488bad14c1c36071ee5f3119d95cd7
|
146 Bytes | Download |
|
md5:27bc74b2e6f04b827804af59739ebd20
|
8.5 kB | Download |
|
md5:68b329da9893e34099c7d8ad5cb9c940
|
1 Byte | Preview Download |
|
md5:ef2a56328eefe7ce834797f36a756ea8
|
2.5 kB | Download |
|
md5:86d3f3a95c324c9479bd8986968f4327
|
11.4 kB | Download |
|
md5:e25a5f38226a18011b6b4daafa6d35eb
|
8.4 kB | Download |
|
md5:6199fd211119cc30c40209e702765852
|
595 Bytes | Download |
|
md5:82606901db8dadf1789ac13d423395b5
|
7.5 kB | Preview Download |
|
md5:b1f4da906392d983f952bb2da41defe6
|
234 Bytes | Preview Download |
|
md5:bdc95706195fc44fe3c554fc0a4c9868
|
533 Bytes | Download |
|
md5:d19a419ea98ae93d7a7b66b189aa5052
|
42 Bytes | Download |
|
md5:858055aef73d8a278474340940841a80
|
1.3 kB | Download |
|
md5:48a484b377b0931a337c2facd8cf2d3a
|
409 Bytes | Preview Download |
|
md5:f360ccc00a12b3faae007bd66a77daf4
|
447 Bytes | Download |
|
md5:ae2dc86e0b8c8d983f4937b6f132ee11
|
8 Bytes | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/PetrKorab/Arabica
- Programming language
- Python
- Development Status
- Active