Interactive dashboard that pulls live news or Reddit posts, runs NLP preprocessing and sentiment analysis (Hugging Face distilbert-base-uncased-finetuned-sst-2-english and TextBlob), and optionally generates short market summaries with GPT‑2. Built with Dash + Plotly.
Source file in this repo:
Real-Time Market Sentiment Dashboard - M_Analyse_Sources_P.py(you can rename it toapp.py).
- Data sources: NewsAPI (
/v2/everything) or Reddit (via PRAW). - NLP:
- Tokenization, stopword removal, lemmatization (NLTK).
- Transformer-based sentiment (DistilBERT SST‑2) via
transformers.pipeline. - TextBlob polarity as a second baseline.
- Optional short generation/summaries with GPT‑2 (
text-generationpipeline).
- UI (Dash):
- Inputs: query, date range, language, page size, source (news/reddit).
- Outputs: two histograms (Transformers vs TextBlob), generated text area.
- Runtime: CPU by default, optional GPU with PyTorch CUDA.
Real-Time-AI-Powered-Market-Sentiment-Dashboard/
├─ Real-Time Market Sentiment Dashboard - M_Analyse_Sources_P.py # main app
└─ README.md
- Python ≥ 3.9
- Packages: dash, plotly, requests, nltk, torch, transformers, textblob, python-dotenv, praw
Install (example):
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
pip install dash plotly requests nltk torch transformers textblob python-dotenv prawFirst run NLTK downloads may fetch corpora at runtime. You can pre-download to avoid startup cost.
import nltk
nltk.download("punkt")
nltk.download("stopwords")
nltk.download("wordnet")
nltk.download("averaged_perceptron_tagger")
nltk.download("maxent_ne_chunker")
nltk.download("words")
nltk.download("omw-1.4")Create a .env file in the project root:
NEWS_API_KEY=your_newsapi_key
REDDIT_CLIENT_ID=your_reddit_app_client_id
REDDIT_CLIENT_SECRET=your_reddit_app_client_secret
Notes:
- NewsAPI: create an API key at newsapi.org and respect their rate limits/terms.
- Reddit: create a Reddit application (script) to obtain credentials (PRAW). The app uses the subreddit name typed into the query box when Source=Reddit.
On Windows (filename has spaces):
python "Real-Time Market Sentiment Dashboard - M_Analyse_Sources_P.py"Or rename to app.py and run:
python app.pyThe server binds to port 8051 by default:
- Open
http://127.0.0.1:8051/in your browser.
- Fetch content
- News:
everything?q=<query>&from=<date>&to=<date>&language=<lang>&apiKey=$NEWS_API_KEY. - Reddit: fetch top/hot posts from the given subreddit; use
title + selftextas content.
- News:
- Preprocess
- Lowercase, strip punctuation, keep alphabetic tokens, remove stopwords, lemmatize.
- Sentiment
- Transformers pipeline
sentiment-analysis(DistilBERT SST‑2). - TextBlob polarity in parallel.
- Transformers pipeline
- Visualization
- Two histograms (Transformers labels, TextBlob polarity).
- Optional generation
- GPT‑2
text-generationpipeline processes the concatenated article text to produce a short market write‑up.
- GPT‑2
To use a GPU with pipelines, construct pipelines with a device index:
from transformers import pipeline
sentiment_model = pipeline(
"sentiment-analysis",
model="distilbert-base-uncased-finetuned-sst-2-english",
device=0 # CUDA:0; use -1 for CPU
)
text_generator = pipeline(
"text-generation",
model="gpt2",
device=0 # or -1 for CPU
)Avoid calling .to(...) on a pipeline object. If you need manual control, move the underlying model to CUDA and pass it into a new pipeline.
-
Overwrites Reddit results
After selectingsource='reddit', the script callsget_news_articles(...)again unconditionally, replacing Reddit data. Remove the second call so the branch result is preserved.if source == "news": articles = get_news_articles(...) elif source == "reddit": posts = get_reddit_posts(query, limit=int(page_size)) articles = [{"content": p["text"], "title": p["title"]} for p in posts] # DO NOT call get_news_articles again here
-
Status code check inverted
get_news_articlesprints an error whenstatus_code == 200. It should return articles directly on 200 and only print on errors.resp = requests.get(url, params=payload) if resp.status_code == 200: return resp.json().get("articles", []) else: print(f"Couldn't retrieve articles. HTTP {resp.status_code}") return [{"content": "", "title": "Error"}]
-
Generated text variable
The code buildsgenerated_texts(list) but then callsprocess_generated_text(generated_text)with an undefined variable. Either process each chunk and join, or generate once.generated = text_generator(input_text, max_length=512, num_return_sequences=1) processed_text = process_generated_text(generated)
-
Null content fields
Some NewsAPI articles havecontent=None. Guard concatenation:article_texts = [a.get("content") or a.get("description") or "" for a in articles] concatenated_text = " ".join(article_texts)
-
API key leakage
The script doesprint(API_KEY). Remove any logging of secrets. -
NLTK downloads at startup
Repeatednltk.download(...)calls slow startup. Consider moving downloads to setup or guard them.
def get_news_articles(q, from_param, to, language="en", page_size=10):
url = "https://newsapi.org/v2/everything"
payload = {"q": q, "from": from_param, "to": to, "language": language, "apiKey": API_KEY, "pageSize": page_size}
r = requests.get(url, params=payload)
if r.status_code == 200:
return r.json().get("articles", [])
print(f"NewsAPI error: HTTP {r.status_code} — {r.text[:200]}")
return [{"content": "", "title": "Error"}]- Hugging Face Transformers and TextBlob for NLP
- Dash and Plotly for the web UI
- NewsAPI and Reddit (PRAW) for data access