Published June 28, 2021 | Version v0.5.3
Software Open

kevinlu1248/pyate: Python Automated Term Extraction

Creators

  • 1. @hackcouver

Description

PyATE is a Python implementation of term extraction algorithms such as C-Value, Basic, Combo Basic, Weirdness and Term Extractor using spaCy POS tagging.

PyATE extracts term candidates from natural languages and assigns each of the candidates a "termhood", a float which can be interpreted as the relative confidence of the candidate being a term.

A distinction needs to be made between "keywords" and "terms". Terms are technical and often specific to a domain and often includes jargon while keywords are mainly for categorizing documents. Formally, “a term is the designation of a defined concept in a special language by a linguistic expression. A term may consist of one or more words." (ISO 1087)

Source code can be found at https://github.com/kevinlu1248/pyate. Documentation can be found at https://kevinlu1248.github.io/pyate/ and a web app demonstrating the algorithms can be found at https://pyate-demo.herokuapp.com/.

Notes

To download and use PyATE, use pip instead. This release is for uploading this project to Zenodo, generating a DOI so that PyATE can be cited.

Files

kevinlu1248/pyate-v0.5.3.zip

Files (16.6 MB)

Name Size Download all
md5:da541d96720bdaa07924e942631f7dc4
16.6 MB Preview Download

Additional details

Related works

References

  • Frantzi K.T., Ananiadou S., Tsujii J. (1998) The C-value/NC-value Method of Automatic Recognition for Multi-word Terms. In: Nikolaou C., Stephanidis C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1998. Lecture Notes in Computer Science, vol 1513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49653-X_35
  • Georgeta Bordea, Paul Buitelaar and Tamara Polajnar (2013) Domain-independent term extraction through domain modelling the 10th International Conference on Terminology and Artificial Intelligence (TIA 2013), Paris, France
  • Astrakhantsev, N. (2018). ATR4S: toolkit with state-of-the-art automatic terms recognition methods in Scala. Language Resources and Evaluation, 52(3), 853-872.
  • Sclano F., Velardi P. (2007) TermExtractor: a Web Application to Learn the Shared Terminology of Emergent Web Communities. In: Gonçalves R.J., Müller J.P., Mertins K., Zelm M. (eds) Enterprise Interoperability II. Springer, London. https://doi.org/10.1007/978-1-84628-858-6_32
  • Zhang, F. (2008). A Comparative Evaluation of Term Recognition Algorithms. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08). European Language Resources Association (ELRA).
  • Navigli, R., & Velardi, P. (2004). Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites. Comput. Linguist., 30(2), 151–179.
  • Zhang, Z., Gao, J., & Ciravegna, F. (2018). Semre-rank: Improving automatic term extraction by incorporating semantic relatedness with personalised pagerank. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(5), 1-41.
  • Lars Ahrenberg (2009) Term Extraction: A Review. Retrieved from https://www.ida.liu.se/~larah03/Publications/tereview_v2.pdf
  • Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.