Skip to content

ShreyaVijaykumar/Diff-Insight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DiffInsight πŸ”

AI-powered Git diff analysis tool β€” transform any git diff into structured code review reports, team-aware change breakdowns, and churn heatmaps. Built for developers, team leads, and anyone who reviews code.

🌐 Live Demo: Diff-Insight


What is DiffInsight?

DiffInsight takes a raw git diff and turns it into something actually useful β€” a structured review report, a breakdown of which parts of your codebase changed, merge conflict warnings, and a timeline showing which files keep getting touched across multiple versions.

It runs entirely in the browser, connects to Groq's free LLM API for AI analysis, and works on diffs from any language or framework.


✨ Features

🧠 AI-Powered Diff Analyzer

Upload a .diff, .patch, or .txt file β€” or paste a diff directly β€” and get a structured LLM-generated report in seconds.

  • Senior Reviewer mode β€” concise, critical analysis focused on risks and actionable suggestions
  • Junior Mentor mode β€” plain-English explanations, learning points, and encouragement for newer developers
  • Supports standard git diff, diff -ruN, and most unified diff variants
  • Files containing secrets (.env, private keys, vault configs) are automatically stripped before reaching the LLM

πŸ—ΊοΈ Change Intelligence Panel

A team-aware breakdown of every file in the diff β€” works on any language, no AST parsing required.

  • Layers Touched β€” instantly see which architectural layers were affected: Backend, Frontend, LLM/AI, Security, Tests, Config/Infra, Database, and more
  • Change Type Classification β€” every file labelled as NEW, MODIFIED, EXPANDED, REFACTORED, or DELETED
  • Merge Conflict Candidates β€” files flagged High/Medium/Low risk based on deletion ratio and churn volume
  • Churn Bar β€” proportional visualisation of how much each file changed relative to the total
  • File Type Breakdown β€” quick read on whether it was a backend-only, full-stack, or config change

πŸ“Š Diff Timeline & Churn Heatmap

Compare multiple versions of the same project to see which files keep getting touched.

  • Add up to 20 diffs with custom labels (e.g. v1β†’v2, sprint-3, hotfix-auth)
  • Hotspot Leaderboard β€” top 5 files ranked by total churn across all diffs with medal rankings
  • Heatmap Grid β€” files Γ— diffs matrix with colour-intensity cells (purple = low churn β†’ orange = high churn)
  • Hover tooltips showing exact churn count and change type per cell
  • Churn bar and touches column showing cross-diff coverage at a glance

πŸ’¬ Tech Assistant

Ask any technical question and get a structured answer from the LLM.

  • Auto-detects topic from your question (40+ keywords: FastAPI, Docker, PostgreSQL, Redis, Terraform, AWS, PyTorch, RAG, JWT, and more)
  • Structured answers covering: explanation, real example, industry use case, and common mistake
  • Powered by Groq's llama-3.3-70b-versatile model

πŸ”­ GitHub Explorer

Search GitHub repositories without leaving the app.

  • Search by topic and filter by language
  • Sort by ⭐ Stars, 🍴 Forks, πŸ•’ Recently Updated, πŸ› Most Issues, πŸ‘οΈ Most Watchers
  • Fetches up to 1000 results and sorts the full set client-side β€” every sort option operates over all results, not just the first page
  • Results show all 5 metrics plus last updated date and a direct link

πŸ”’ Security & Privacy

  • Two-layer diff sanitiser β€” sensitive file paths (.env, .pem, secret_manager, id_rsa) are completely blocked before the LLM sees them; individual lines with credential-shaped values are redacted in-place
  • GitHub token stored via HashiCorp Vault locally, or plain environment variable on cloud
  • Rate limiting β€” 10 requests per 60 seconds per IP
  • Sanitisation warning banner in the UI if any content was stripped

πŸš€ Live Deployment

Live Demo: Diff-Insight

Deployed on Railway with:

  • Groq API (llama-3.3-70b-versatile) for LLM analysis β€” free tier, no GPU needed
  • FastAPI backend with uvicorn
  • GitHub token via Railway environment variables

πŸ“„ How to Generate a Git Diff File

A git diff is a text file showing exactly what changed between two versions of your code β€” which lines were added, removed, or modified. DiffInsight reads this file and analyses it.

Understanding the diff format

diff --git a/backend/main.py b/backend/main.py
--- a/backend/main.py        ← original file
+++ b/backend/main.py        ← updated file
@@ -10,6 +10,8 @@           ← hunk header: line numbers affected
 def home():                  ← unchanged context line (space prefix)
-    return "hello"           ← line that was REMOVED (minus prefix)
+    return "hello world"     ← line that was ADDED (plus prefix)
+    # updated greeting       ← another added line
Symbol Meaning
--- Original (old) version of the file
+++ Updated (new) version of the file
@@ Hunk header showing which line numbers changed
- Line that was removed
+ Line that was added
(space) Unchanged context line

Method 1 β€” Compare two commits

Find your commit hashes with git log, then diff them.

# See your recent commits
git log --oneline

# Output example:
# a1b2c3d Add login feature
# e4f5g6h Fix bug in auth
# i7j8k9l Initial commit

# Compare any two commits (older β†’ newer)
git diff i7j8k9l a1b2c3d > my_diff.txt

Method 2 β€” Compare two branches

Use this when reviewing a feature branch before merging into main.

# Compare feature branch against main
git diff main feature-branch > branch_diff.txt

# Compare your current branch against main
git diff main > current_vs_main.txt

Method 3 β€” See uncommitted changes

Use this to review what you have changed before committing.

# All unstaged changes (files edited but not staged yet)
git diff > unstaged.txt

# All staged changes (files you ran git add on)
git diff --staged > staged.txt

# Everything changed since last commit (staged + unstaged)
git diff HEAD > all_changes.txt

Method 4 β€” Compare two tags or releases

Use this to see everything that changed between two versions of your project.

# Compare release tags
git diff v1.0 v2.0 > release_diff.txt

# Compare a tag against current state
git diff v1.0 HEAD > since_v1.txt

Method 5 β€” Compare a specific file only

Use this when you only care about one file's history.

# See how one file changed between two commits
git diff e4f5g6h a1b2c3d -- backend/main.py > main_py_diff.txt

# See all changes to one file since last commit
git diff HEAD -- backend/main.py > main_changes.txt

Method 6 β€” Compare two separate folders or zip files

Use this when you have two versions of a project as separate folders with no shared git history.

# Unzip your versions
unzip project-v1.zip -d v1
unzip project-v2.zip -d v2

# Generate the diff recursively across all files
diff -ru v1/ v2/ > v1_to_v2.txt

# If the zip extracts into a subfolder, go one level deeper
diff -ru v1/project-main/ v2/project-main/ > v1_to_v2.txt

The -r flag means recursive (goes through all subfolders) and -u gives unified format which DiffInsight can read.


Method 7 β€” Using the Churn Heatmap with multiple diffs

If you have three versions of a project (v1, v2, v3), generate all combinations and load them into the Churn Heatmap to see which files changed the most across the entire history.

# Generate all three diffs
diff -ru v1/ v2/ > v1_to_v2.txt
diff -ru v2/ v3/ > v2_to_v3.txt
diff -ru v1/ v3/ > v1_to_v3.txt

Then in DiffInsight β†’ Churn Heatmap:

  1. Paste v1_to_v2.txt β†’ label v1β†’v2 β†’ click Add Diff
  2. Paste v2_to_v3.txt β†’ label v2β†’v3 β†’ click Add Diff
  3. Paste v1_to_v3.txt β†’ label v1β†’v3 β†’ click Add Diff
  4. Click Generate Heatmap

You will see exactly which files were touched in every version and which ones are hotspots.


Tips

  • Always save with > to write the output to a file β€” without it the diff just prints to the terminal and disappears
  • Use .txt extension β€” DiffInsight accepts .diff, .patch, and .txt
  • If your diff file is empty, the two versions are probably identical, or the folders extracted with a nested subfolder β€” try going one level deeper
  • Large diffs (thousands of lines) work fine β€” DiffInsight handles up to 5MB

πŸ› οΈ Run Locally

Prerequisites

1. Clone the repository

git clone https://github.com/ShreyaVijaykumar/Diff-Insight.git
cd Diff-Insight/diffinsight

2. Install dependencies

pip install -r requirements.txt

3. Set environment variables

Create a .env file β€” it is already in .gitignore so it will never be committed:

GROQ_API_KEY=gsk_your_key_here
GITHUB_TOKEN=ghp_your_token_here

Get your free Groq API key at console.groq.com

Get your GitHub token at github.com/settings/tokens β€” tick the public_repo scope.

4. Start the server

uvicorn backend.main:app --reload

5. Open in browser

http://127.0.0.1:8000

πŸ—‚οΈ Project Structure

diffinsight/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py                      # FastAPI app, all endpoints, rate limiting
β”‚   β”œβ”€β”€ llm/
β”‚   β”‚   β”œβ”€β”€ analyzer.py              # Groq-powered diff analysis
β”‚   β”‚   └── tech_assistant.py        # Groq-powered Q&A with topic detection
β”‚   β”œβ”€β”€ security/
β”‚   β”‚   └── secret_manager.py        # Vault + env var fallback for tokens
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   └── github_service.py        # GitHub search with full result set sorting
β”‚   └── utils/
β”‚       β”œβ”€β”€ change_intelligence.py   # Team-aware diff breakdown (any language)
β”‚       β”œβ”€β”€ churn_heatmap.py         # Multi-diff heatmap matrix builder
β”‚       β”œβ”€β”€ diff_sanitiser.py        # Two-layer secret stripping before LLM
β”‚       └── risk.py                  # Risk level computation
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ templates/
β”‚   β”‚   └── index.html               # Single-page app
β”‚   └── static/
β”‚       β”œβ”€β”€ script.js                # All frontend logic
β”‚       └── style.css                # Dark glass UI with purple/orange gradients
β”œβ”€β”€ railway.toml                     # Railway deployment config
β”œβ”€β”€ requirements.txt
└── .env.example                     # Template for environment variables

πŸ–₯️ Tech Stack

Layer Technology
Backend FastAPI, Python 3.11
LLM Groq API β€” llama-3.3-70b-versatile
Diff Parsing unidiff
Frontend Vanilla JS, CSS (glass morphism)
Fonts Syne + JetBrains Mono
Secrets HashiCorp Vault (local) / env vars (cloud)
Deployment Railway

πŸ”’ Security Notes

  • Sensitive files (.env, private keys, vault configs, credential files) are completely removed from the diff before the LLM sees anything
  • Hardcoded secrets in non-sensitive files (tokens, passwords, connection strings) are redacted in-place β€” the diff structure is preserved but values are replaced with [REDACTED]
  • The UI shows a warning banner if any content was stripped, so you always know what the LLM saw
  • GitHub tokens never appear in code β€” retrieved from Vault or environment variables only
  • Rate limiting prevents abuse β€” 10 requests per 60 seconds per IP address

πŸ“ˆ Roadmap

  • PR description auto-writer from diff
  • Reviewer assignment suggester based on layers touched
  • Commit message quality scorer
  • Diff history and session comparison
  • Export report as markdown / PDF
  • JavaScript/TypeScript dependency graph support

πŸ‘©β€πŸ’» Author

Shreya Vijaykumar github.com/ShreyaVijaykumar


About

AI-powered Git diff analysis tool that transforms any git diff into structured code review reports, team-aware change breakdowns, and churn heatmaps. Built for developers, team leads, and anyone who reviews code.

Topics

Resources

Stars

Watchers

Forks

Contributors