Published February 4, 2022 | Version V1.0
Project deliverable Open

Policy Cloud D4.4 REUSABLE MODEL & ANALYTICAL TOOLS: SOFTWARE PROTOTYPE 2

Description

This document is the second software demonstrator deliverable of PolicyCLOUD at M22 (October 2021) of the project and is intended for the reviewers of the software deliverables.

This deliverable provides a second and much upgraded description of the software demonstration for the components of the Integrated Data Acquisition and Analytics (DAA) Layer, which provides the analytical capabilities of the PolicyCLOUD platform. The components include the DAA API Gateway (responsible for the overall orchestration and the layer API), the built-in analytical tools - for Data cleaning and interoperability, Situational Knowledge, Opinion Mining & Sentiment Analysis and Social Dynamics & Behavioural Data analysis, and the Operational Data Repository.

The DAA API Gateway (responsible for the overall orchestration and the layer API) as well as the built-in analytical tools for Data cleaning and interoperability, Situational Knowledge, Opinion Mining & Sentiment Analysis have been integrated with almost all the use cases at least in standalone mode. In addition, full integration for two use cases was demonstrated during the review of June 2021.

The advanced Social Dynamics & Behavioural Data analysis component is fully operational in standalone mode and will be integrated with the PolicyCLOUD framework this coming year.

In terms of used infrastructure, the Operational Data Repository has been enhanced thanks to the adoption of the “seamless” technology (section 4.4 of [34]) which offers a two-tier storage architecture based on the LeanXcale relational database: the first tier which was used till now and the Object Storage: the second tier. This novel architecture presents single logical datasets to users which can be explored with SQL.

Notes

This deliverable is submitted to the EC, not yet approved.

Files

PolicyCLOUD_D4.4_Reusable Model and Analytical Tools Software Prototype 2_v1.0.pdf

Additional details

Funding

PolicyCLOUD – Policy Management through technologies across the complete data lifecycle on cloud environments. 870675
European Commission

References

  • PolicyCLOUD. D4.1 - Reusable Model & Analytical Tools: Design and Open Specification 1. Biran Ofer. 2020.
  • PolicyCLOUD. D2.2 CONCEPTUAL MODEL & REFERENCE ARCHITECTURE. 2020.
  • PolicyCLOUD. D6.3 - Use Case Scenarios Definition & Design. Sancho Javier. 2020.
  • Wikidata, https://www.wikidata.org/wiki/Wikidata:Main_Page
  • Xin J., Afrasiabi C., Lelong S., Adesara J., Tsueng G., Su A. I., and Wu C., Cross-linking BioThings APIs through JSON-LD to facilitate knowledge exploration, BMC bioinformatics, 19(1), 30, 2018.
  • Kubernetes, https://kubernetes.io
  • Kafka, https://kafka.apache.org
  • OpenWhisk, https://OpenWhisk.apache.org
  • Python, https://www.python.org
  • spaCy, https://spacy.io/
  • SPARQL, https://www.w3.org/TR/rdf-sparql-query
  • Jupyter, https://jupyter.org/
  • Anaconda, https://www.anaconda.com/
  • PIP Installer https://pip.pypa.io/en/stable/
  • L. Zhao, L. Li, X. Zheng, and J. Zhang, "A BERT based sentiment analysis and key entity detection approach for online financial texts", in 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 1233-1238. IEEE, 2021.
  • C. Sun, L. Huang, and X. Qiu, "Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence", arXiv preprint arXiv:1903.09588, 2019.
  • T. Nasukawa, and J. Yi., "Sentiment analysis: Capturing favorability using natural language processing", in Proceedings of the 2nd international conference on Knowledge capture, pp. 70-77, October 2003.
  • K. Dave, S. Lawrence, and D.M. Pennock, "Mining the peanut gallery: Opinion extraction and semantic classification of product reviews", in Proceedings of the 12th international conference on World Wide Web, pp. 519-528, May 2003.
  • B. Pang, and L. Lee, "Seeing Stars: Exploiting class relationships for sentiment categorization with respect to rating scales", in Proceedings of the 43rd annual meeting on association for computational linguistics, Association for Computational Linguistics, pp. 115-124. 2005.
  • P.D. Turney, "Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews", in Proceedings of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, pp. 417-424, July 2002.
  • B. Pang, L. Lee, and S. Vaithyanathan. "Thumbs up? Sentiment classification using Machine Learning Techniques", in Proceedings of the ACL-02 conference on empirical methods in natural language processing, Association for Computational Linguistics. pp. 79–86. July 2002.
  • R. Moraes, J.F. Valiati, and W.P. Neto, "Document-level sentiment classification: an empirical comparison between SVM and ANN". Expert Systems with Applications, vol. 40, no. 2, 2013.
  • W. C. F. Mariel, S. Mariyah, and S. Pramana, "Sentiment analysis: a comparison of deep learning neural network algorithm with SVM and Naϊve Bayes for Indonesian text", in Journal of Physics: Conference Series, vol. 971, no. 1, p. 012049. IOP Publishing, March 2018.
  • R. Johnson, and T. Zhang, "Effective use of word order for text categorization with convolutional neural networks", in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2015), 2015.
  • A. Vaswani et al, "Attention is all you need", in Advances in neural information processing systems, pp. 5998-6008, 2017.
  • Z. Yang, D. Yang, C. Dyer, X. He, A.J. Smola, and E.H. Hovy. "Hierarchical attention networks for document classification", in Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 1480-1489, June 2016.
  • X. Li, L. Bing, W. Zhang, and W. Lam, "Exploiting BERT for end-to-end aspect-based sentiment analysis", arXiv preprint arXiv:1910.00883, 2019.
  • Pramanik, Jitendra & Samal, Abhaya Kumar & Sahoo, Kabita & Pani, Dr. Subhendu. Exploratory Data Analysis using Python. 8. 4727-4735, 2019.
  • Visualizing Categorical Distributions. Retrieved October 13, 2021, https://inferentialthinking.com/chapters/07/1/Visualizing_Categorical_Distributions.html
  • JSA And UC Claimants In Camden, https://opendata.camden.gov.uk/People-Places/JSA-And-UC- Claimants-In-Camden/x2rm-3zds
  • Techniques and Applications for Sentiment Analysis. (April 2013). Retrieved from https://airtonbjunior.github.io/mestrado/seminars/presentations/2/Presentation2AirtonV1.pdf
  • Moilanen, Karo and Stephen G. Pulman. "Multi-entity Sentiment Scoring." RANLP (2009).
  • Colm, Colm. Multi-entity sentiment analysis using entity-level feature extraction and word embeddings approach. 733-740, 2017.
  • PolicyCLOUD. D4.3 - Reusable Model & Analytical Tools: Design and Open Specification 2. Biran Ofer. 2021.
  • Logstash, https://www.elastic.co/logstash/
  • PolicyCLOUD. D4.2 - Reusable Model & Analytical Tools: Software Prototype 1.
  • Pandas, https://pandas.pydata.org/
  • Docker, https://www.docker.com/
  • Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169- 15211, 2019.
  • Kim, S., Park, H., & Lee, J. Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: A study on blockchain technology trend analysis. Expert Systems with Applications, 152, 113401, 2020.
  • Shimosaka, M., Tsukiji, T., Tominaga, S., & Tsubouchi, K. Coupled hierarchical Dirichlet process mixtures for simultaneous clustering and topic modeling. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 230-246). Springer, 2016.
  • Syed, S., & Spruit, M. Full-text or abstract? examining topic coherence scores using latent dirichlet allocation. In 2017 IEEE International conference on data science and advanced analytics (DSAA) (pp. 165-174). IEEE, 2017.
  • acob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, pages 4171–4186, 2019.
  • Mao, R., Lin, C., & Guerin, F. Combining Pre-trained Word Embeddings and Linguistic Features for Sequential Metaphor Identification. arXiv preprint arXiv:2104.03285, 2021.
  • Hu, M., Peng, Y., Huang, Z., Li, D., & Lv, Y. Open-domain targeted sentiment analysis via span- based extraction and classification. arXiv preprint arXiv:1906.03820, 2019.
  • Hutto, C. and Eric Gilbert. "VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text." ICWSM, 2014.
  • Textblob Documentation. Release 0.16.0. Retrieved October 22, 2021, from https://buildmedia.readthedocs.org/media/pdf/textblob/latest/textblob.pdf