Open Development · Self-Refining AI Assistant

PROJECT-R2

Because one pass was never enough.

All comparisons and observations reflect personal experience and opinion. PROJECT-R2 is an independent student project developed for learning purposes and is not affiliated with, endorsed by, or associated with any third-party AI company or product. If you have any concerns regarding attribution, credit, or content removal, please contact: pranjal.tiwari2486@gmail.com.

The Observation

A raw language model does one thing by default:

prompt → predict next token → append → predict next token → repeat

One forward pass. Token by token. No internal review. No second thoughts. Whatever comes out first is what you get.

I noticed something while using these tools daily. When an answer wasn't good enough, I'd type "revise this" or "improve it further" — and it would. Every single time. A noticeably better answer. It would even explain exactly where the gaps in its first response were.

The Question That Built This Project
If the model can improve its answer when asked to — and can already identify what was wrong with the first one — why didn't it just give the better answer from the start?
Because nothing asked it to. The interface delivered the first draft and called it done.

The model isn't broken. The model is capable of significantly better than its first response. The problem is that every mainstream AI interface stops at one pass and hands you the draft.

PROJECT-R2 automates the revision loop you were already doing manually.

The Problem

Beyond the single-pass issue, there's a second wall: these tools have no memory of your work.

Mid-session, deep into a problem — the AI forgets what was decided three messages ago. Gives a generic answer that ignores the stack you're working with. You re-explain the entire context every time you open a new chat. The longer the session, the more context bleeds out.

Two separate problems. Both hitting at once. One about quality. One about memory.

Raw ChatGPT / Claude PROJECT-R2
First draft is the final answer Internally revised before you ever see it
Context bleeds out in long sessions Smart compression — nothing critical ever lost
User doesn't control context compression Context compresses when user approves
You manage the refining prompts manually The system manages the refining loop

The problem isn't the model. The problem is the interface built around it — one that never asks the model to think twice.

How It Works

PROJECT-R2 sits between you and the model. Before any answer reaches you, it runs the revision loop internally — the same loop you were running manually, now automated.

Your Task
plain language input + full project context injected
Solver Agent
generates first answer
Critic Agent
finds flaws, gaps, edge cases — the job you were doing manually
Judge Agent
scores quality, decides if another pass is needed
Solver Agent
revised answer using critique and judgment
↺  loop runs up to N times until quality threshold is met
Final Answer
the only thing you see — already through multiple passes

This is what AI research calls reflection, debate, and iterative refinement — techniques used in serious AI systems. Most of that logic lives outside the model, orchestrated externally. That's exactly what this is.

Context Memory System

The refinement loop only works well if the model knows your project. Generic context produces generic answers regardless of how many passes run.

Instead of sending the full conversation history on every call — expensive, slow, hits token limits fast — PROJECT-R2 maintains a living structured memory that evolves with your session:

{
  "project":          { "name": "", "stack": "", "entry_point": "" },
  "files":            { "filename": "one line description" },
  "decisions":        ["what was decided + why"],
  "failed_attempts":  ["what failed + why"],
  "working_patterns": ["what works + how"],
  "current_task":     { "goal": "", "status": "", "blockers": "" }
}

A Summarizer Agent reads the conversation every 10 messages and extracts only the facts that affect future answers. Narrative is dropped. Specifics — exact file names, decisions made, what failed and why — are kept.

When tokens approach the limit, a Smart Cleanup Agent categorizes every item in context:

Safe to remove  — old, resolved, superseded decisions
Safe to compress  — keep the fact, drop the explanation
Never touch  — active files, live decisions, current task

The agent decides what's safe. You confirm. The session continues without losing anything that matters.

Task-Aware Model Selection

PROJECT-R2 is not a coding-only assistant. Different tasks need different models. When you start a task, you select the mode — the system routes accordingly:

Mode Best For Model via OpenRouter
🖥️  Coding Architecture, debugging, code review DeepSeek Coder / Claude
🧠  Reasoning Planning, decisions, analysis Claude / GPT-4o
✍️  Writing Docs, content, communication Claude / Mistral
⚡  Quick Fast lookups, simple tasks Mistral 7B / Gemma

All models accessed via OpenRouter. Routing handled internally. During development: free tier only. When commercial: model swap requires zero architecture change.

Architecture Overview

Refinement Module
solver → critic → judge → loop · the core of the system
Memory Module
context.json · summarizer agent · living structured memory
Token Safety Module
counter · monitor · smart cleanup agent
Model Router
task-aware model selection · routes per mode at runtime
Cloud — OpenRouter
all agents routed via OpenRouter · model selected per task mode

Current Development Status

Phase 1
Local System
🔄  In Progress
Phase 2
Backend API
⬜  Not Started
Phase 3
Personal + Friends Testing
⬜  Not Started
Phase 4
Frontend
⬜  Not Started
Phase 5
Deployment
⬜  Not Started
Phase 6
Production Hardening
⬜  Not Started
Currently on:  Phase 1 → Stage 1 — Refinement Loop
Building the core engine. Everything else depends on this working first.
📌  No local models during development. All agents run via OpenRouter and GROQ Api free tier until the system is proven. Model strategy revisited at Phase 6 if needed for cost optimization at scale.

Building Plan

Phase 1 · Current
Local System
Goal: a working self-refining assistant on your own machine. All agents use OpenRouter free tier during this phase.
Stage 0Environment setup — OpenRouter free tier, Python✅ Done
Stage 1Refinement loop — solver, critic, judge, CLI✅ Done
Stage 2Context foundation for solver— context.json loop reads it✅ Done
Stage 3Memory Manager — extracts facts from conversation✅ Done
Stage 4Token Manager — counter function, usage logging, warnings, prediction of usage🔄 In Progress
Stage 5Smart cleanup agent — categorize, user confirms, execute🔄 In Progress
Stage 6Architecture Polishing⬜ Not Started
Stage 7Unified CLI — clean interface, commands, session save/load⬜ Not Started
Stage 8Model routing — task-aware model selection per mode⬜ Not Started
(Optional)User Statistics Extraction, Temporary Loop Memory, Agentic Skills⬜ Not Started
Phase 2
Backend API
Goal: turn local scripts into a deployable API.
Stage 1Wrap refinement loop in FastAPI endpoints⬜ Not Started
Stage 2Session management — create, load, isolate per user⬜ Not Started
Stage 3Database — Supabase, store sessions, messages, context snapshots⬜ Not Started
Stage 4API key management — encrypted per user(use their google account to auto create their own api key??), never exposed⬜ Not Started
Phase 3
Personal + Friends Testing
Goal: me and a few known people using it simultaneously without touching each other's data. No public launch. Closed circle only.
Stage 1User registration and login⬜ Not Started
Stage 2JWT session tokens⬜ Not Started
Stage 3Full user isolation — context, history, token usage per user⬜ Not Started
Stage 4Basic rate limiting per user⬜ Not Started
Phase 4
Frontend
Goal: someone who doesn't know Python can use PROJECT-R2 comfortably. Designed by me, built with Lovable.
Stage 1Chat interface⬜ Not Started
Stage 2Live context sidebar⬜ Not Started
Stage 3Token usage bar⬜ Not Started
Stage 4Session history + task mode selector⬜ Not Started
Stage 5Inline cleanup prompts⬜ Not Started
Phase 5
Deployment
Goal: publicly accessible via URL.
Stage 1Backend → Render⬜ Not Started
Stage 2Database → Supabase⬜ Not Started
Stage 3Frontend → Vercel⬜ Not Started
Stage 4Scale model strategy if needed⬜ Not Started
Phase 6
Production Hardening
Goal: reliable, monitorable, abuse-resistant.
Stage 1Error handling and fallbacks⬜ Not Started
Stage 2Logging system⬜ Not Started
Stage 3Token usage dashboard per user⬜ Not Started
Stage 4Model fallback routing⬜ Not Started
Stage 5Abuse protection⬜ Not Started

Stack

Models
OpenRouter, GROQ
Dev API
OpenRouter and GROQ Api free tier
Backend
FastAPI
Database
Supabase
Frontend
Lovable AI · TypeScript / React
Design offered by me, built with Lovable
Backend Deploy
Render
Frontend Deploy
Vercel
Language
Python · TypeScript

Philosophy

The model is not the product.
The thinking process around the model is the product.

Raw LLMs don't think in multiple passes by default. They predict the next token, append it, and repeat — one forward pass, start to finish. But when forced to reflect, critique, and revise, they reliably produce better answers. The research agrees.

Reflection, debate, iterative refinement — these are among the most effective techniques in modern AI systems. And most of that logic lives outside the model, orchestrated externally. That's exactly what PROJECT-R2 is.

Not a smarter model. A smarter process wrapped around the model you already use.