MaartenGr/BERTopic: v0.10.0

Maarten Grootendorst; Nils Reimers; hp0404

doi:10.5281/zenodo.6507568

Published April 30, 2022 | Version v0.10.0

Software Open

MaartenGr/BERTopic: v0.10.0

1. IKNL
2. Huggingface

Highlights

Use any dimensionality reduction technique instead of UMAP:

from bertopic import BERTopic
from sklearn.decomposition import PCA

dim_model = PCA(n_components=5)
topic_model = BERTopic(umap_model=dim_model)

Use any clustering technique instead of HDBSCAN:

from bertopic import BERTopic
from sklearn.cluster import KMeans

cluster_model = KMeans(n_clusters=50)
topic_model = BERTopic(hdbscan_model=cluster_model)

Documentation

Add a CountVectorizer page with tips and tricks on how to create topic representations that fit your use case

Added pages on how to use other dimensionality reduction and clustering algorithms

Additional instructions on how to reduce outliers in the FAQ:

import numpy as np
probability_threshold = 0.01
new_topics = [np.argmax(prob) if max(prob) >= probability_threshold else -1 for prob in probs]

Fixes

Fixed None being returned for probabilities when transforming unseen documents
Replaced all instances of arg: with Arguments: for consistency
Before saving a fitted BERTopic instance, we remove the stopwords in the fitted CountVectorizer model as it can get quite large due to the number of words that end in stopwords if min_df is set to a value larger than 1
Set "hdbscan>=0.8.28" to prevent numpy issues
- Although this was already fixed by the new release of HDBSCAN, it is technically still possible to install 0.8.27 with BERTopic which leads to these numpy issues
Update gensim dependency to >=4.0.0 (#371)
Fix topic 0 not appearing in visualizations (#472)
Fix #506
Fix #429

Files

MaartenGr/BERTopic-v0.10.0.zip

Files (6.2 MB)

Name	Size	Download all
MaartenGr/BERTopic-v0.10.0.zip md5:db73a29436eccc8caef7b936982a9e3e	6.2 MB	Preview Download

Additional details

Is supplement to: https://github.com/MaartenGr/BERTopic/tree/v0.10.0 (URL)

	All versions	This version
Views	12,432	586
Downloads	1,033	21
Data volume	5.6 GB	130.0 MB

MaartenGr/BERTopic: v0.10.0

Authors/Creators

Description

Files

MaartenGr/BERTopic-v0.10.0.zip

Files (6.2 MB)

Additional details

Related works