reemmarji-03/nlp-pilot: v0.1.0
Authors/Creators
Description
NLP Pilot v0.1.0
What's Included:
1. Document Tools
Upload and inspect text corpora in PDF, DOCX, TXT, and CSV formats. Explore word frequency distributions, run sentiment analysis and named-entity recognition, view data quality statistics, and export document-level reports as PDF or JSON.
2. Preprocessing Pipeline
Build ordered text-cleaning pipelines interactively. Available operations include lowercasing, URL and mention removal, contraction expansion, numeric normalization, stopword filtering, stemming, lemmatization, length-based token filtering, and regex-based replacements. A live preview reflects changes on the document in real time.
3. Vectorization
Convert processed text into numerical representations using Bag-of-Words, word-level TF-IDF, character-level TF-IDF, or transformer-based sentence embeddings (via distilbert-base-uncased and sentence-transformers). Inspect feature spaces and visualize embedding distributions using PCA.
4. Prediction and Topic Modeling
Train and evaluate supervised classifiers and regressors (Logistic Regression, Linear SVM, Random Forest, linear regression) with configurable train/test splits, k-fold cross-validation, and auto model selection. Visualize K-Means clustering in static or animated mode. Run BERTopic for unsupervised topic discovery with an interactive term bar chart.
5. Retrieval-Augmented Generation (RAG)
Index uploaded documents with FAISS using all-MiniLM-L6-v2 embeddings and query them through a chat interface. Supports fully local operation via Ollama or cloud-based inference via OpenAI and Anthropic APIs.
6. Agent Lab
Express high-level analytical goals in natural language. A LangGraph-powered agent guides the user through preprocessing, vectorization, and modeling step by step, invoking the corresponding module functions automatically while allowing the user to customize or override any stage.
Extensibility Register custom scikit-learn-compatible estimators through the GUI or programmatically via core.model_registry.register_model. Reproducibility A global random seed is applied across all stochastic components. A reproducible capsule generator exports environment details, package versions, the random seed, and a pip freeze snapshot for archival and sharing. Requirements
Python 3.11 Tested on Windows and Linux No API key required for core pipeline; fully local operation supported via Ollama
Installation
bashgit clone https://github.com/reemmarji-03/nlp-pilot.git
cd nlp-pilot
pip install -r requirements.txt
python main.py
Full installation instructions, Conda environment setup, and usage guidelines are available in the README.
Files
reemmarji-03/nlp-pilot-v0.1.0.zip
Files
(104.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:d578c9b201e73659b0a06bf12a1d4654
|
104.0 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/reemmarji-03/nlp-pilot/tree/v0.1.0 (URL)
Software
- Repository URL
- https://github.com/reemmarji-03/nlp-pilot