Published March 24, 2024 | Version 1.7.1
Software Open

Arabica: A Python package for exploratory analysis of text data

  • 1. Zeppelin University in Friedrichshafen, Germany
  • 2. Brno University of Technology, Department of Radio Electronics, Czech Republic

Contributors

  • 1. Zeppelin University in Friedrichshafen, Germany
  • 2. Brno University of Technology, Department of Radio Electronics, Czech Republic

Description

Research meta-data is typically recorded as a time series with dimensions of cross-sections (e.g., article title, journal, volume, issue, author’s names, and affiliations) and time (e.g., publication date). Meta-datasets provide valuable insights into the research trends in a particular field of science. Meta-analysis (a group of methods to analyze research meta-data) currently does not implement text analytics in either programming language. This package aims to fill that need. Arabica offers descriptive analytics, visualization, sentiment classification, and structural break analysis for exploratory analysis of research meta-datasets in easy-to-use Python implementation.

The package operates on three main modules: (1) descriptive and time-series n-gram analysis provides a frequency summarization of the key topics in the meta-dataset, (2) visualization module displays key-term frequencies in a heatmap, line plot, and word cloud, (3) sentiment and structural breakpoint analysis evaluates sentiment from research article titles and identifies turning points in the sentiment of published research. It uses VADER [@Hutto:2014] and FinVADER [@finvader:2023], the updated model with financial lexicons, to classify sentiment. Clustering-based Fisher-Jenks algorithm [@Jenks:1977] finds break points in the data.

The package has more general use for exploratory analysis of time-series text datasets, mainly in social sciences. In business economics, it improves customer satisfaction measurement through product reviews analysis. In politology and behavioral economics, it enables detailed text mining of social media interactions. Similarly, in finance, it simplifies financial sentiment analysis of news headlines.

Files

dependency_links.txt

Files (87.5 kB)

Name Size Download all
md5:726c1d36a00344e805d4ca89f5ddd105
105 Bytes Download
md5:fb546d6bd1cae455bba6041db784be40
12.0 kB Download
md5:dc47ca8dc719a3702b68f6e3666926c5
31.6 kB Download
md5:00aa3360a20881e44cbec077b742db26
1.7 kB Download
md5:58488bad14c1c36071ee5f3119d95cd7
146 Bytes Download
md5:27bc74b2e6f04b827804af59739ebd20
8.5 kB Download
md5:68b329da9893e34099c7d8ad5cb9c940
1 Byte Preview Download
md5:ef2a56328eefe7ce834797f36a756ea8
2.5 kB Download
md5:86d3f3a95c324c9479bd8986968f4327
11.4 kB Download
md5:e25a5f38226a18011b6b4daafa6d35eb
8.4 kB Download
md5:6199fd211119cc30c40209e702765852
595 Bytes Download
md5:82606901db8dadf1789ac13d423395b5
7.5 kB Preview Download
md5:b1f4da906392d983f952bb2da41defe6
234 Bytes Preview Download
md5:bdc95706195fc44fe3c554fc0a4c9868
533 Bytes Download
md5:d19a419ea98ae93d7a7b66b189aa5052
42 Bytes Download
md5:858055aef73d8a278474340940841a80
1.3 kB Download
md5:48a484b377b0931a337c2facd8cf2d3a
409 Bytes Preview Download
md5:f360ccc00a12b3faae007bd66a77daf4
447 Bytes Download
md5:ae2dc86e0b8c8d983f4937b6f132ee11
8 Bytes Preview Download

Additional details

Software

Repository URL
https://github.com/PetrKorab/Arabica
Programming language
Python
Development Status
Active