Skip to content

Philippe-Guerrier/Real-Time-AI-Powered-Market-Sentiment-Dashboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Real-Time AI-Powered Market Sentiment Dashboard (Dash + Transformers)

Interactive dashboard that pulls live news or Reddit posts, runs NLP preprocessing and sentiment analysis (Hugging Face distilbert-base-uncased-finetuned-sst-2-english and TextBlob), and optionally generates short market summaries with GPT‑2. Built with Dash + Plotly.

Source file in this repo: Real-Time Market Sentiment Dashboard - M_Analyse_Sources_P.py (you can rename it to app.py).


Overview

  • Data sources: NewsAPI (/v2/everything) or Reddit (via PRAW).
  • NLP:
    • Tokenization, stopword removal, lemmatization (NLTK).
    • Transformer-based sentiment (DistilBERT SST‑2) via transformers.pipeline.
    • TextBlob polarity as a second baseline.
    • Optional short generation/summaries with GPT‑2 (text-generation pipeline).
  • UI (Dash):
    • Inputs: query, date range, language, page size, source (news/reddit).
    • Outputs: two histograms (Transformers vs TextBlob), generated text area.
  • Runtime: CPU by default, optional GPU with PyTorch CUDA.

Project Structure

Real-Time-AI-Powered-Market-Sentiment-Dashboard/
├─ Real-Time Market Sentiment Dashboard - M_Analyse_Sources_P.py   # main app
└─ README.md

Requirements

  • Python ≥ 3.9
  • Packages: dash, plotly, requests, nltk, torch, transformers, textblob, python-dotenv, praw

Install (example):

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate

pip install dash plotly requests nltk torch transformers textblob python-dotenv praw

First run NLTK downloads may fetch corpora at runtime. You can pre-download to avoid startup cost.

import nltk
nltk.download("punkt")
nltk.download("stopwords")
nltk.download("wordnet")
nltk.download("averaged_perceptron_tagger")
nltk.download("maxent_ne_chunker")
nltk.download("words")
nltk.download("omw-1.4")

Environment Variables (.env)

Create a .env file in the project root:

NEWS_API_KEY=your_newsapi_key
REDDIT_CLIENT_ID=your_reddit_app_client_id
REDDIT_CLIENT_SECRET=your_reddit_app_client_secret

Notes:

  • NewsAPI: create an API key at newsapi.org and respect their rate limits/terms.
  • Reddit: create a Reddit application (script) to obtain credentials (PRAW). The app uses the subreddit name typed into the query box when Source=Reddit.

Run

On Windows (filename has spaces):

python "Real-Time Market Sentiment Dashboard - M_Analyse_Sources_P.py"

Or rename to app.py and run:

python app.py

The server binds to port 8051 by default:

  • Open http://127.0.0.1:8051/ in your browser.

How It Works

  1. Fetch content
    • News: everything?q=<query>&from=<date>&to=<date>&language=<lang>&apiKey=$NEWS_API_KEY.
    • Reddit: fetch top/hot posts from the given subreddit; use title + selftext as content.
  2. Preprocess
    • Lowercase, strip punctuation, keep alphabetic tokens, remove stopwords, lemmatize.
  3. Sentiment
    • Transformers pipeline sentiment-analysis (DistilBERT SST‑2).
    • TextBlob polarity in parallel.
  4. Visualization
    • Two histograms (Transformers labels, TextBlob polarity).
  5. Optional generation
    • GPT‑2 text-generation pipeline processes the concatenated article text to produce a short market write‑up.

GPU (Optional)

To use a GPU with pipelines, construct pipelines with a device index:

from transformers import pipeline

sentiment_model = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    device=0  # CUDA:0; use -1 for CPU
)

text_generator = pipeline(
    "text-generation",
    model="gpt2",
    device=0  # or -1 for CPU
)

Avoid calling .to(...) on a pipeline object. If you need manual control, move the underlying model to CUDA and pass it into a new pipeline.


Known Issues (from the reference script)

  1. Overwrites Reddit results
    After selecting source='reddit', the script calls get_news_articles(...) again unconditionally, replacing Reddit data. Remove the second call so the branch result is preserved.

    if source == "news":
        articles = get_news_articles(...)
    elif source == "reddit":
        posts = get_reddit_posts(query, limit=int(page_size))
        articles = [{"content": p["text"], "title": p["title"]} for p in posts]
    # DO NOT call get_news_articles again here
  2. Status code check inverted
    get_news_articles prints an error when status_code == 200. It should return articles directly on 200 and only print on errors.

    resp = requests.get(url, params=payload)
    if resp.status_code == 200:
        return resp.json().get("articles", [])
    else:
        print(f"Couldn't retrieve articles. HTTP {resp.status_code}")
        return [{"content": "", "title": "Error"}]
  3. Generated text variable
    The code builds generated_texts (list) but then calls process_generated_text(generated_text) with an undefined variable. Either process each chunk and join, or generate once.

    generated = text_generator(input_text, max_length=512, num_return_sequences=1)
    processed_text = process_generated_text(generated)
  4. Null content fields
    Some NewsAPI articles have content=None. Guard concatenation:

    article_texts = [a.get("content") or a.get("description") or "" for a in articles]
    concatenated_text = " ".join(article_texts)
  5. API key leakage
    The script does print(API_KEY). Remove any logging of secrets.

  6. NLTK downloads at startup
    Repeated nltk.download(...) calls slow startup. Consider moving downloads to setup or guard them.


Minimal Patch Example

def get_news_articles(q, from_param, to, language="en", page_size=10):
    url = "https://newsapi.org/v2/everything"
    payload = {"q": q, "from": from_param, "to": to, "language": language, "apiKey": API_KEY, "pageSize": page_size}
    r = requests.get(url, params=payload)
    if r.status_code == 200:
        return r.json().get("articles", [])
    print(f"NewsAPI error: HTTP {r.status_code}{r.text[:200]}")
    return [{"content": "", "title": "Error"}]

Acknowledgements

  • Hugging Face Transformers and TextBlob for NLP
  • Dash and Plotly for the web UI
  • NewsAPI and Reddit (PRAW) for data access