All comparisons and observations reflect personal experience and opinion. PROJECT-R2 is an independent student project developed for learning purposes and is not affiliated with, endorsed by, or associated with any third-party AI company or product. If you have any concerns regarding attribution, credit, or content removal, please contact: pranjal.tiwari2486@gmail.com.
A raw language model does one thing by default:
prompt → predict next token → append → predict next token → repeat
One forward pass. Token by token. No internal review. No second thoughts. Whatever comes out first is what you get.
I noticed something while using these tools daily. When an answer wasn't good enough, I'd type "revise this" or "improve it further" — and it would. Every single time. A noticeably better answer. It would even explain exactly where the gaps in its first response were.
The model isn't broken. The model is capable of significantly better than its first response. The problem is that every mainstream AI interface stops at one pass and hands you the draft.
PROJECT-R2 automates the revision loop you were already doing manually.
Beyond the single-pass issue, there's a second wall: these tools have no memory of your work.
Mid-session, deep into a problem — the AI forgets what was decided three messages ago. Gives a generic answer that ignores the stack you're working with. You re-explain the entire context every time you open a new chat. The longer the session, the more context bleeds out.
Two separate problems. Both hitting at once. One about quality. One about memory.
| Raw ChatGPT / Claude | PROJECT-R2 |
|---|---|
| First draft is the final answer | Internally revised before you ever see it |
| Context bleeds out in long sessions | Smart compression — nothing critical ever lost |
| User doesn't control context compression | Context compresses when user approves |
| You manage the refining prompts manually | The system manages the refining loop |
The problem isn't the model. The problem is the interface built around it — one that never asks the model to think twice.
PROJECT-R2 sits between you and the model. Before any answer reaches you, it runs the revision loop internally — the same loop you were running manually, now automated.
This is what AI research calls reflection, debate, and iterative refinement — techniques used in serious AI systems. Most of that logic lives outside the model, orchestrated externally. That's exactly what this is.
The refinement loop only works well if the model knows your project. Generic context produces generic answers regardless of how many passes run.
Instead of sending the full conversation history on every call — expensive, slow, hits token limits fast — PROJECT-R2 maintains a living structured memory that evolves with your session:
{
"project": { "name": "", "stack": "", "entry_point": "" },
"files": { "filename": "one line description" },
"decisions": ["what was decided + why"],
"failed_attempts": ["what failed + why"],
"working_patterns": ["what works + how"],
"current_task": { "goal": "", "status": "", "blockers": "" }
}
A Summarizer Agent reads the conversation every 10 messages and extracts only the facts that affect future answers. Narrative is dropped. Specifics — exact file names, decisions made, what failed and why — are kept.
When tokens approach the limit, a Smart Cleanup Agent categorizes every item in context:
The agent decides what's safe. You confirm. The session continues without losing anything that matters.
PROJECT-R2 is not a coding-only assistant. Different tasks need different models. When you start a task, you select the mode — the system routes accordingly:
| Mode | Best For | Model via OpenRouter |
|---|---|---|
| 🖥️ Coding | Architecture, debugging, code review | DeepSeek Coder / Claude |
| 🧠 Reasoning | Planning, decisions, analysis | Claude / GPT-4o |
| ✍️ Writing | Docs, content, communication | Claude / Mistral |
| ⚡ Quick | Fast lookups, simple tasks | Mistral 7B / Gemma |
All models accessed via OpenRouter. Routing handled internally. During development: free tier only. When commercial: model swap requires zero architecture change.
| Stage 0 | Environment setup — OpenRouter free tier, Python | ✅ Done |
| Stage 1 | Refinement loop — solver, critic, judge, CLI | ✅ Done |
| Stage 2 | Context foundation for solver— context.json loop reads it | ✅ Done |
| Stage 3 | Memory Manager — extracts facts from conversation | ✅ Done |
| Stage 4 | Token Manager — counter function, usage logging, warnings, prediction of usage | 🔄 In Progress |
| Stage 5 | Smart cleanup agent — categorize, user confirms, execute | 🔄 In Progress |
| Stage 6 | Architecture Polishing | ⬜ Not Started |
| Stage 7 | Unified CLI — clean interface, commands, session save/load | ⬜ Not Started |
| Stage 8 | Model routing — task-aware model selection per mode | ⬜ Not Started |
| (Optional) | User Statistics Extraction, Temporary Loop Memory, Agentic Skills | ⬜ Not Started |
| Stage 1 | Wrap refinement loop in FastAPI endpoints | ⬜ Not Started |
| Stage 2 | Session management — create, load, isolate per user | ⬜ Not Started |
| Stage 3 | Database — Supabase, store sessions, messages, context snapshots | ⬜ Not Started |
| Stage 4 | API key management — encrypted per user(use their google account to auto create their own api key??), never exposed | ⬜ Not Started |
| Stage 1 | User registration and login | ⬜ Not Started |
| Stage 2 | JWT session tokens | ⬜ Not Started |
| Stage 3 | Full user isolation — context, history, token usage per user | ⬜ Not Started |
| Stage 4 | Basic rate limiting per user | ⬜ Not Started |
| Stage 1 | Chat interface | ⬜ Not Started |
| Stage 2 | Live context sidebar | ⬜ Not Started |
| Stage 3 | Token usage bar | ⬜ Not Started |
| Stage 4 | Session history + task mode selector | ⬜ Not Started |
| Stage 5 | Inline cleanup prompts | ⬜ Not Started |
| Stage 1 | Backend → Render | ⬜ Not Started |
| Stage 2 | Database → Supabase | ⬜ Not Started |
| Stage 3 | Frontend → Vercel | ⬜ Not Started |
| Stage 4 | Scale model strategy if needed | ⬜ Not Started |
| Stage 1 | Error handling and fallbacks | ⬜ Not Started |
| Stage 2 | Logging system | ⬜ Not Started |
| Stage 3 | Token usage dashboard per user | ⬜ Not Started |
| Stage 4 | Model fallback routing | ⬜ Not Started |
| Stage 5 | Abuse protection | ⬜ Not Started |
Raw LLMs don't think in multiple passes by default. They predict the next token, append it, and repeat — one forward pass, start to finish. But when forced to reflect, critique, and revise, they reliably produce better answers. The research agrees.
Reflection, debate, iterative refinement — these are among the most effective techniques in modern AI systems. And most of that logic lives outside the model, orchestrated externally. That's exactly what PROJECT-R2 is.
Not a smarter model. A smarter process wrapped around the model you already use.