Open Development · Self-Refining AI Assistant

PROJECT-R2

Because one pass was never enough.

All comparisons and observations reflect personal experience and opinion. PROJECT-R2 is an independent student project developed for learning purposes and is not affiliated with, endorsed by, or associated with any third-party AI company or product. If you have any concerns regarding attribution, credit, or content removal, please contact: pranjal.tiwari2486@gmail.com.

The Observation

A raw language model does one thing by default:

prompt → predict next token → append → predict next token → repeat

One forward pass. Token by token. No internal review. No second thoughts. Whatever comes out first is what you get.

I noticed something while using these tools daily. When an answer wasn't good enough, I'd type "revise this" or "improve it further" — and it would. Every single time. A noticeably better answer. It would even explain exactly where the gaps in its first response were.

The Question That Built This Project

If the model can improve its answer when asked to — and can already identify what was wrong with the first one — why didn't it just give the better answer from the start?

Because nothing asked it to. The interface delivered the first draft and called it done.

The model isn't broken. The model is capable of significantly better than its first response. The problem is that every mainstream AI interface stops at one pass and hands you the draft.

PROJECT-R2 automates the revision loop you were already doing manually.

The Problem

Beyond the single-pass issue, there's a second wall: these tools have no memory of your work.

Mid-session, deep into a problem — the AI forgets what was decided three messages ago. Gives a generic answer that ignores the stack you're working with. You re-explain the entire context every time you open a new chat. The longer the session, the more context bleeds out.

Two separate problems. Both hitting at once. One about quality. One about memory.

Raw ChatGPT / Claude	PROJECT-R2
First draft is the final answer	Internally revised before you ever see it
Context bleeds out in long sessions	Smart compression — nothing critical ever lost
User doesn't control context compression	Context compresses when user approves
You manage the refining prompts manually	The system manages the refining loop

The problem isn't the model. The problem is the interface built around it — one that never asks the model to think twice.

How It Works

PROJECT-R2 sits between you and the model. Before any answer reaches you, it runs the revision loop internally — the same loop you were running manually, now automated.

Your Task

plain language input + full project context injected

Solver Agent

generates first answer

Critic Agent

finds flaws, gaps, edge cases — the job you were doing manually

Judge Agent

scores quality, decides if another pass is needed

Solver Agent

revised answer using critique and judgment

↺ loop runs up to N times until quality threshold is met

Final Answer

the only thing you see — already through multiple passes

This is what AI research calls reflection, debate, and iterative refinement — techniques used in serious AI systems. Most of that logic lives outside the model, orchestrated externally. That's exactly what this is.

Context Memory System

The refinement loop only works well if the model knows your project. Generic context produces generic answers regardless of how many passes run.

Instead of sending the full conversation history on every call — expensive, slow, hits token limits fast — PROJECT-R2 maintains a living structured memory that evolves with your session:

{
  "project":          { "name": "", "stack": "", "entry_point": "" },
  "files":            { "filename": "one line description" },
  "decisions":        ["what was decided + why"],
  "failed_attempts":  ["what failed + why"],
  "working_patterns": ["what works + how"],
  "current_task":     { "goal": "", "status": "", "blockers": "" }
}

A Summarizer Agent reads the conversation every 10 messages and extracts only the facts that affect future answers. Narrative is dropped. Specifics — exact file names, decisions made, what failed and why — are kept.

When tokens approach the limit, a Smart Cleanup Agent categorizes every item in context:

Safe to remove — old, resolved, superseded decisions

Safe to compress — keep the fact, drop the explanation

Never touch — active files, live decisions, current task

The agent decides what's safe. You confirm. The session continues without losing anything that matters.

Task-Aware Model Selection

PROJECT-R2 is not a coding-only assistant. Different tasks need different models. When you start a task, you select the mode — the system routes accordingly:

Mode	Best For	Model via OpenRouter
🖥️ Coding	Architecture, debugging, code review	DeepSeek Coder / Claude
🧠 Reasoning	Planning, decisions, analysis	Claude / GPT-4o
✍️ Writing	Docs, content, communication	Claude / Mistral
⚡ Quick	Fast lookups, simple tasks	Mistral 7B / Gemma

All models accessed via OpenRouter. Routing handled internally. During development: free tier only. When commercial: model swap requires zero architecture change.

Architecture Overview

Refinement Module

solver → critic → judge → loop · the core of the system

Memory Module

context.json · summarizer agent · living structured memory

Token Safety Module

counter · monitor · smart cleanup agent

Model Router

task-aware model selection · routes per mode at runtime

↓

Cloud — OpenRouter

all agents routed via OpenRouter · model selected per task mode

Current Development Status

Phase 1

Local System

🔄 In Progress

Phase 2

Backend API

⬜ Not Started

Phase 3

Personal + Friends Testing

⬜ Not Started

Phase 4

Frontend

⬜ Not Started

Phase 5

Deployment

⬜ Not Started

Phase 6

Production Hardening

⬜ Not Started

📌 No local models during development. All agents run via OpenRouter and GROQ Api free tier until the system is proven. Model strategy revisited at Phase 6 if needed for cost optimization at scale.

Building Plan

Phase 1 · Current

Local System

Goal: a working self-refining assistant on your own machine. All agents use OpenRouter free tier during this phase.

Stage 0	Environment setup — OpenRouter free tier, Python	✅ Done
Stage 1	Refinement loop — solver, critic, judge, CLI	✅ Done
Stage 2	Context foundation for solver— context.json loop reads it	✅ Done
Stage 3	Memory Manager — extracts facts from conversation	✅ Done
Stage 4	Token Manager — counter function, usage logging, warnings, prediction of usage	🔄 In Progress
Stage 5	Smart cleanup agent — categorize, user confirms, execute	🔄 In Progress
Stage 6	Architecture Polishing	⬜ Not Started
Stage 7	Unified CLI — clean interface, commands, session save/load	⬜ Not Started
Stage 8	Model routing — task-aware model selection per mode	⬜ Not Started
(Optional)	User Statistics Extraction, Temporary Loop Memory, Agentic Skills	⬜ Not Started

Phase 2

Backend API

Goal: turn local scripts into a deployable API.

Stage 1	Wrap refinement loop in FastAPI endpoints	⬜ Not Started
Stage 2	Session management — create, load, isolate per user	⬜ Not Started
Stage 3	Database — Supabase, store sessions, messages, context snapshots	⬜ Not Started
Stage 4	API key management — encrypted per user(use their google account to auto create their own api key??), never exposed	⬜ Not Started

Phase 3

Personal + Friends Testing

Goal: me and a few known people using it simultaneously without touching each other's data. No public launch. Closed circle only.

Stage 1	User registration and login	⬜ Not Started
Stage 2	JWT session tokens	⬜ Not Started
Stage 3	Full user isolation — context, history, token usage per user	⬜ Not Started
Stage 4	Basic rate limiting per user	⬜ Not Started

Phase 4

Frontend

Goal: someone who doesn't know Python can use PROJECT-R2 comfortably. Designed by me, built with Lovable.

Stage 1	Chat interface	⬜ Not Started
Stage 2	Live context sidebar	⬜ Not Started
Stage 3	Token usage bar	⬜ Not Started
Stage 4	Session history + task mode selector	⬜ Not Started
Stage 5	Inline cleanup prompts	⬜ Not Started

Phase 5

Deployment

Goal: publicly accessible via URL.

Stage 1	Backend → Render	⬜ Not Started
Stage 2	Database → Supabase	⬜ Not Started
Stage 3	Frontend → Vercel	⬜ Not Started
Stage 4	Scale model strategy if needed	⬜ Not Started

Phase 6

Production Hardening

Goal: reliable, monitorable, abuse-resistant.

Stage 1	Error handling and fallbacks	⬜ Not Started
Stage 2	Logging system	⬜ Not Started
Stage 3	Token usage dashboard per user	⬜ Not Started
Stage 4	Model fallback routing	⬜ Not Started
Stage 5	Abuse protection	⬜ Not Started

Stack

Models

OpenRouter, GROQ

Dev API

OpenRouter and GROQ Api free tier

Backend

FastAPI

Database

Supabase

Frontend

Lovable AI · TypeScript / React

Design offered by me, built with Lovable

Backend Deploy

Render

Frontend Deploy

Vercel

Language

Python · TypeScript

Philosophy

The model is not the product.
The thinking process around the model is the product.

Raw LLMs don't think in multiple passes by default. They predict the next token, append it, and repeat — one forward pass, start to finish. But when forced to reflect, critique, and revise, they reliably produce better answers. The research agrees.

Reflection, debate, iterative refinement — these are among the most effective techniques in modern AI systems. And most of that logic lives outside the model, orchestrated externally. That's exactly what PROJECT-R2 is.

Not a smarter model. A smarter process wrapped around the model you already use.