There is a newer version of the record available.

Published April 26, 2021 | Version v0.7.0
Software Open

MaartenGr/BERTopic: Major Release v0.7

  • 1. Van Spaendonck

Description

The two main features are (semi-)supervised topic modeling and several backends to use instead of Flair and SentenceTransformers!

Highlights:

  • (semi-)supervised topic modeling by leveraging supervised options in UMAP
    • model.fit(docs, y=target_classes)
  • Backends:
    • Added Spacy, Gensim, USE (TFHub)
    • Use a different backend for document embeddings and word embeddings
    • Create your own backends with bertopic.backend.BaseEmbedder
    • Click here for an overview of all new backends
  • Calculate and visualize topics per class
    • Calculate: topics_per_class = topic_model.topics_per_class(docs, topics, classes)
    • Visualize: topic_model.visualize_topics_per_class(topics_per_class)
  • Several tutorials were updated and added:
Name Link Topic Modeling with BERTopic (Custom) Embedding Models in BERTopic Advanced Customization in BERTopic (semi-)Supervised Topic Modeling with BERTopic Dynamic Topic Modeling with Trump's Tweets

Fixes:

  • Fixed issues with Torch req
  • Prevent saving term frequency matrix in CTFIDF class
  • Fixed DTM not working when reducing topics (#96)
  • Moved visualization dependencies to base BERTopic
    • pip install bertopic[visualization] becomes pip install bertopic
  • Allow precomputed embeddings in bertopic.find_topics() (#79):
model = BERTopic(embedding_model=my_embedding_model)
model.fit(docs, my_precomputed_embeddings)
model.find_topics(search_term)

Files

MaartenGr/BERTopic-v0.7.0.zip

Files (6.1 MB)

Name Size Download all
md5:b245d5a13e401c8100e5d0fed5e2cc20
6.1 MB Preview Download

Additional details

Related works