Published June 2, 2026 | Version v0.1.0
Software Open

reemmarji-03/nlp-pilot: v0.1.0

Authors/Creators

Description

NLP Pilot v0.1.0

What's Included:

1. Document Tools

Upload and inspect text corpora in PDF, DOCX, TXT, and CSV formats. Explore word frequency distributions, run sentiment analysis and named-entity recognition, view data quality statistics, and export document-level reports as PDF or JSON.

2. Preprocessing Pipeline

Build ordered text-cleaning pipelines interactively. Available operations include lowercasing, URL and mention removal, contraction expansion, numeric normalization, stopword filtering, stemming, lemmatization, length-based token filtering, and regex-based replacements. A live preview reflects changes on the document in real time.

3. Vectorization

Convert processed text into numerical representations using Bag-of-Words, word-level TF-IDF, character-level TF-IDF, or transformer-based sentence embeddings (via distilbert-base-uncased and sentence-transformers). Inspect feature spaces and visualize embedding distributions using PCA.

4. Prediction and Topic Modeling

Train and evaluate supervised classifiers and regressors (Logistic Regression, Linear SVM, Random Forest, linear regression) with configurable train/test splits, k-fold cross-validation, and auto model selection. Visualize K-Means clustering in static or animated mode. Run BERTopic for unsupervised topic discovery with an interactive term bar chart.

5. Retrieval-Augmented Generation (RAG)

Index uploaded documents with FAISS using all-MiniLM-L6-v2 embeddings and query them through a chat interface. Supports fully local operation via Ollama or cloud-based inference via OpenAI and Anthropic APIs.

6. Agent Lab

Express high-level analytical goals in natural language. A LangGraph-powered agent guides the user through preprocessing, vectorization, and modeling step by step, invoking the corresponding module functions automatically while allowing the user to customize or override any stage.

Extensibility Register custom scikit-learn-compatible estimators through the GUI or programmatically via core.model_registry.register_model. Reproducibility A global random seed is applied across all stochastic components. A reproducible capsule generator exports environment details, package versions, the random seed, and a pip freeze snapshot for archival and sharing. Requirements

Python 3.11 Tested on Windows and Linux No API key required for core pipeline; fully local operation supported via Ollama

Installation

bashgit clone https://github.com/reemmarji-03/nlp-pilot.git
cd nlp-pilot
pip install -r requirements.txt
python main.py

Full installation instructions, Conda environment setup, and usage guidelines are available in the README.

Files

reemmarji-03/nlp-pilot-v0.1.0.zip

Files (104.0 kB)

Name Size Download all
md5:d578c9b201e73659b0a06bf12a1d4654
104.0 kB Preview Download

Additional details

Related works