kevinlu1248/pyate: Python Automated Term Extraction

doi:10.5281/zenodo.5039290

Published June 28, 2021 | Version v0.5.3

Software Open

kevinlu1248/pyate: Python Automated Term Extraction

Lu, Kevin¹

1. @hackcouver

Others:

PyATE is a Python implementation of term extraction algorithms such as C-Value, Basic, Combo Basic, Weirdness and Term Extractor using spaCy POS tagging.

PyATE extracts term candidates from natural languages and assigns each of the candidates a "termhood", a float which can be interpreted as the relative confidence of the candidate being a term.

A distinction needs to be made between "keywords" and "terms". Terms are technical and often specific to a domain and often includes jargon while keywords are mainly for categorizing documents. Formally, “a term is the designation of a defined concept in a special language by a linguistic expression. A term may consist of one or more words." (ISO 1087)

Source code can be found at https://github.com/kevinlu1248/pyate. Documentation can be found at https://kevinlu1248.github.io/pyate/ and a web app demonstrating the algorithms can be found at https://pyate-demo.herokuapp.com/.

Notes

To download and use PyATE, use pip instead. This release is for uploading this project to Zenodo, generating a DOI so that PyATE can be cited.

Files

kevinlu1248/pyate-v0.5.3.zip

Files (16.6 MB)

Name	Size	Download all
kevinlu1248/pyate-v0.5.3.zip md5:da541d96720bdaa07924e942631f7dc4	16.6 MB	Preview Download

Additional details

Is supplement to: https://github.com/kevinlu1248/pyate/tree/v0.5.3 (URL)

Frantzi K.T., Ananiadou S., Tsujii J. (1998) The C-value/NC-value Method of Automatic Recognition for Multi-word Terms. In: Nikolaou C., Stephanidis C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1998. Lecture Notes in Computer Science, vol 1513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49653-X_35
Georgeta Bordea, Paul Buitelaar and Tamara Polajnar (2013) Domain-independent term extraction through domain modelling the 10th International Conference on Terminology and Artificial Intelligence (TIA 2013), Paris, France
Astrakhantsev, N. (2018). ATR4S: toolkit with state-of-the-art automatic terms recognition methods in Scala. Language Resources and Evaluation, 52(3), 853-872.
Sclano F., Velardi P. (2007) TermExtractor: a Web Application to Learn the Shared Terminology of Emergent Web Communities. In: Gonçalves R.J., Müller J.P., Mertins K., Zelm M. (eds) Enterprise Interoperability II. Springer, London. https://doi.org/10.1007/978-1-84628-858-6_32
Zhang, F. (2008). A Comparative Evaluation of Term Recognition Algorithms. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08). European Language Resources Association (ELRA).
Navigli, R., & Velardi, P. (2004). Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites. Comput. Linguist., 30(2), 151–179.
Zhang, Z., Gao, J., & Ciravegna, F. (2018). Semre-rank: Improving automatic term extraction by incorporating semantic relatedness with personalised pagerank. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(5), 1-41.
Lars Ahrenberg (2009) Term Extraction: A Review. Retrieved from https://www.ida.liu.se/~larah03/Publications/tereview_v2.pdf
Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.

	All versions	This version
Views	814	808
Downloads	17	17
Data volume	281.4 MB	281.4 MB

kevinlu1248/pyate: Python Automated Term Extraction

Others:

Notes

Files

kevinlu1248/pyate-v0.5.3.zip

Files (16.6 MB)

Additional details

Related works

References

kevinlu1248/pyate: Python Automated Term Extraction

Creators

Contributors

Others:

Description

Notes

Files

kevinlu1248/pyate-v0.5.3.zip

Files (16.6 MB)

Additional details

Related works

References