Skip to content
View parthtiwari-dev's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report parthtiwari-dev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
parthtiwari-dev/README.md
██████╗  █████╗ ██████╗ ████████╗██╗  ██╗    ████████╗██╗██╗    ██╗ █████╗ ██████╗ ██╗
██╔══██╗██╔══██╗██╔══██╗╚══██╔══╝██║  ██║    ╚══██╔══╝██║██║    ██║██╔══██╗██╔══██╗██║
██████╔╝███████║██████╔╝   ██║   ███████║       ██║   ██║██║ █╗ ██║███████║██████╔╝██║
██╔═══╝ ██╔══██║██╔══██╗   ██║   ██╔══██║       ██║   ██║██║███╗██║██╔══██║██╔══██╗██║
██║     ██║  ██║██║  ██║   ██║   ██║  ██║       ██║   ██║╚███╔███╔╝██║  ██║██║  ██║██║
╚═╝     ╚═╝  ╚═╝╚═╝  ╚═╝   ╚═╝   ╚═╝  ╚═╝       ╚═╝   ╚═╝ ╚══╝╚══╝ ╚═╝  ╚═╝╚═╝  ╚═╝╚═╝

Typing SVG


Portfolio LinkedIn Email GitHub Views



◈   SYSTEM BOOT

$ initializing parth_tiwari.profile ...

[✓] identity          →  AI Systems Engineer
[✓] location          →  Bengaluru, India
[✓] status            →  open to the right problem
[✓] philosophy        →  evidence before claims
[✓] vibe-coding       →  NOT DETECTED
[✓] evaluation        →  ACTIVE
[✓] evidence systems  →  9 mapped in EVIDENCEBOUND
[✓] current build     →  SecondSelf
[✓] work node         →  Stick and Dot  (AI/ML Intern)

[READY] parth_tiwari.profile loaded successfully.



◈   WHO I AM   (told through what broke)

Most profiles show you the wins. Here's what actually happened.


Building a fraud engine. Backtesting revealed this:

train ROC-AUC      →  0.895   ← model looked great
production ROC     →  0.60    ← system was lying to itself the whole time

cause:  temporal features bled future signal into past training windows
fix:    leakage validation, point-in-time enforcement, rebuilt from scratch
result: precision stayed useful under a real alert budget

Shipped a Text-to-SQL agent. Hallucination detector reported 100% hallucination:

hallucination_rate  →  100%   ← every query hallucinating?
actual rate         →  0%     ← the metric was wrong, not the system

cause:  schema_tables_used returned ["schema_dict", "tables"] — dict keys, not table names
fix:    one-line patch
lesson: I found this because I wrote a hallucination detector in the first place

Deployed to Render. LLM mixed up two different databases:

question  →  "what is the total revenue?"      (ecommerce schema)
sql       →  SELECT SUM(amount) FROM fines      (library schema — wrong database entirely)

cause:  both schemas lived in the same Chroma collection, embeddings leaked cross-schema
fix:    prompt isolation + schema-scoped retrieval + re-evaluated full 82-query benchmark

The pattern: I find these things because I build evaluation harnesses before I trust results.

- "it works on my machine" → ship it
+ measure → break it intentionally → fix it → measure again → then ship it



◈   MODEL CARD

model_id         : parth-tiwari-v2
type             : early-career AI systems engineer
architecture     : first-principles → build → evaluate → break → fix → deploy
training_data    : production constraints, real failure modes, measurable outcomes

benchmarks:
  text_to_sql_execution_success  : 95.7%     # 82-query ecommerce benchmark
  cross_schema_generalization    : 100%      # zero-shot on unseen library schema
  syntactic_hallucination_rate   : 0.0%      # schema-grounded generation
  fraud_precision_in_budget      : 92.06%    # 0.5% daily alert constraint
  fraud_p95_latency              : ~386ms    # API scoring path
  medrag_answered_faithfulness   : ~0.99     # cited medical retrieval answers
  medrag_refusal_accuracy        : 100%      # insufficient evidence => refusal
  vivid_beta_users               : 10+       # creative AI work under Stick and Dot

serving:
  portfolio          : EVIDENCEBOUND — 9 evidence systems, same-world overlays
  deployment         : Docker · Render · Streamlit · HuggingFace · Vercel
  current_focus      : SecondSelf · evidence-bound career/application OS

known_limitations    : early-career · still learning · high ownership · ships with boundaries



◈   DEPLOYED SYSTEMS

Featured below: 3 public systems. Full map: EVIDENCEBOUND — 9 nodes across personal projects, work evidence, current builds, and tooling.



⚡   QUERYPILOT  ·  Self-Correcting Text-to-SQL Agent

Live API Source

  Natural Language
        │
        ▼
  Schema-Aware RAG  ──►  SQL Generator
                               │
                         Static Validator
                               │
               ┌───────────────┼───────────────┐
          Regex Repair       LLM Fix        Executor
               └───────────────┴───────────────┘
                        Self-Correction Loop
                           (max 3 attempts)
Metric Result Context
First-attempt success 90.0% No correction, cold generation
After self-correction 95.7% 3-stage loop on 82-query benchmark
Hallucination rate 0.0% Zero invented tables or columns
Cross-schema generalization 100% Library schema, zero domain tuning
Cold-start reduction ~400ms Per-schema agent caching

Python LangGraph FastAPI ChromaDB PostgreSQL Docker GitHub Actions



🛡   UPI FRAUD ENGINE  ·  Real-Time Fraud Decision System

Live API Live UI Source

  HARD CONSTRAINTS (non-negotiable):
  ├── score transaction at T using only pre-T features   (no future leakage)
  ├── ≤ 0.5% daily alert budget                         (precision is everything)
  └── simulate delayed fraud labels                     (real-world label lag)

  transactions → point-in-time features → leakage tests → alert-budget model
  train/serve drift surfaced → rebuilt → re-tested under real decision constraints
Metric Result Context
Precision in alert budget 92.06% Only flags what matters
P95 latency ~386ms API scoring path
Leakage tests 55+ Temporal integrity checks
Backtest mode day-by-day Production-like replay

Python XGBoost FastAPI DuckDB Great Expectations Docker



🧬   EVIDENCE-BOUND DRUG RAG  ·  Medical Knowledge Retrieval

Live App HuggingFace Source

  HARD CONSTRAINT: medical domain — hallucination is patient harm
  ├── every claim needs source evidence
  ├── insufficient evidence must trigger refusal, not a guess
  └── faithfulness is measured, not assumed

  FDA + NICE PDFs → semantic chunks → retrieval → citation → refusal policy
Metric Result Context
Answered faithfulness ~0.99 Claims grounded in source
Refusal accuracy 100% Unsupported requests refused
Eval cost $0.168 Cost-aware evaluation
Boundary non-diagnostic Not medical advice

Python FastAPI ChromaDB SentenceTransformers LangChain RAGAS Streamlit




◈   HOW I ACTUALLY BUILD

step 1  →  define what "working" means before writing a single line
step 2  →  build the evaluation harness
step 3  →  write the system
step 4  →  break it intentionally  (adversarial inputs, edge cases, drift simulation)
step 5  →  fix what breaks
step 6  →  measure again
step 7  →  deploy with monitoring hooks
step 8  →  repeat when production proves you wrong

This is how suspicious metrics become trustworthy. This is how a metric bug gets caught before it becomes a product lie. This is how a smaller system with gates beats a bigger prompt with vibes.




◈   STACK

Python SQL XGBoost LangGraph LangChain FastAPI Docker ChromaDB DuckDB PostgreSQL Streamlit Vue Three.js GitHub Actions




◈   STATS

Signal Current State
Evidence systems 9 mapped in EVIDENCEBOUND
Featured public systems QueryPilot · UPI Fraud Engine · MedRAG
Main stack Python · FastAPI · RAG · XGBoost · Vue · Three.js
Current build SecondSelf - evidence-bound career OS
GitHub contribution streak
GitHub activity graph



Typing SVG


Portfolio   LinkedIn   Email


$ ./parth --shutdown

[saving state]   ✓  9 evidence systems mapped
[saving state]   ✓  3 featured systems public
[saving state]   ✓  all evaluation harnesses active
[saving state]   ✓  open to the right problem

[goodbye]  see you on the other side of the next PR.

Pinned Loading

  1. parth-tiwari parth-tiwari Public

    my portfolio

    Vue

  2. querypilot querypilot Public

    QueryPilot -Production-ready multi‑agent Text-to-SQL API for Postgres. Schema‑aware LangGraph pipeline with ChromaDB + sentence‑transformers, Neon-backed DB, and full evaluation on real ecommerce &…

    Python 3

  3. upi-fraud-engine upi-fraud-engine Public

    Real-time UPI fraud detection system (0.8953 ROC-AUC) with <500ms FastAPI scoring, 480+ temporal features, and budget-aware alerts under fintech constraints

    HTML 3

  4. Evidence-Bound-Drug-RAG Evidence-Bound-Drug-RAG Public

    Evidence-grounded medical RAG system that retrieves FDA and NICE drug guidelines, generates cited answers, and safely refuses unsupported queries to minimize hallucinations.

    Python 1

  5. oncoverse oncoverse Public

    OncoVerse is an open-source cancer education atlas that makes cancer biology visible through immersive 3D anatomy, plain-English explanations, and stage-by-stage exploration for patients, families,…

    TypeScript

  6. stick-and-dot-app stick-and-dot-app Public

    TypeScript