Research

Building intelligence for organizational memory

We develop specialized AI models trained to understand how organizations create, lose, and recover knowledge. Our research focuses on memory extraction, relationship detection, and temporal reasoning across unstructured organizational data.

Memory is sensitive. It contains decisions, relationships, and institutional context that should never leave an organization's control. That belief shapes everything we build. Rabbit's architecture, training pipeline, and inference stack are developed entirely in-house. We do not depend on external model providers for any part of the intelligence layer. When an organization deploys Rabbit, they deploy technology we built, on infrastructure they control.

All training data is ethically sourced. We use synthetic data generation to create diverse, representative training sets without exposing any real organizational data.

Training Log

Model evolution

Every training run documented. We believe in transparent research.

April 3, 2026

Initial release v1.0

First multi-signal model. Trained on 55,750 filtered examples across 8 specialized signals: intent classification, entity extraction, triage, query expansion, answer generation, summarization, sentiment analysis, and importance scoring.
55,750 examples
8 signals
3 epochs
~6 hrs training
intentextracttriageexpandanswersummarizesentimentimportance
April 5, 2026

Conversational quality v1.1

Major quality improvement in answer generation. Added conversational formatting with citations, multi-turn conversation support, and graceful uncertainty handling. Introduced reasoning phrases that make answers feel like a knowledgeable colleague.
53,901 examples
10 signals
~2 hrs training
multiturndontknowconversational answersreasoning phrases
April 6, 2026

Relationship intelligence v1.2 Current

Full 12-signal model with memory relationship detection (7 link types) and contradiction/forgotten commitment detection. Deployed to production on Google Cloud infrastructure.
61,178 examples
12 signals
~8 hrs training
21,795 steps
link detectionambient intelligencecontradiction detection7 relationship types
In Progress

Knowledge compilation v1.3

Introducing three new signals: compile (auto-update knowledge pages), lint (detect stale information and gaps), and compile_answer (convert good answers into persistent knowledge). Targeting ~77,000 training examples with formatting and faithfulness improvements.
~77,000 target
15 signals
3 new capabilities
compilelintcompile_answerfaithful extraction
Benchmarks

Evaluation on real organizational data

All benchmarks evaluated against meeting transcripts, email threads, Slack conversations, and project documents.

SignalDescriptionAccuracyScoreLatency
Intent ClassificationRoute queries to the right strategy97%
270ms
Memory TriageAuto-classify and summarize94%
2.1s
Entity ExtractionPeople, orgs, decisions, actions92%
1.8s
Sentiment AnalysisEmotional tone detection91%
350ms
Relationship LinkingCross-memory connections89%
1.5s
Query ExpansionEnrich vague queries87%
700ms
Answer GenerationCited conversational answers85%
4.2s
Data Ethics

How we source and handle training data

We take data ethics seriously. No real user data is used for training.

No real user data

Trained entirely on synthetic data and ethically sourced public datasets. No customer data, no scraped content.

Synthetic generation

Seed-and-expand methodology: 100 hand-crafted examples per signal expanded to 10,000+ using controlled generation, then quality-filtered.

Continuous evaluation

Every version evaluated against held-out test sets with human review. All benchmarks and training parameters published transparently.

Infrastructure

What we use

The tools and services powering Rabbit's training, evaluation, and deployment.

Google Cloud Platform

Production inference and model serving. Supported by Google Cloud credits for startups.

RunPod

GPU compute for training. A100 instances with Unsloth for memory-efficient LoRA training.

Hugging Face

Model hosting and distribution. All versions published for reproducibility.

FastEmbed

Local embedding model (BGE-Base-EN-v1.5) for vector search. On-device, zero external calls.

Open Evaluation

What we measure and why

Every capability is evaluated against specific quality criteria across model versions.

01

Faithfulness

Does the model only state facts present in the source? v1.2 scores 70%. v1.3 targets 95%+.

02

Citation accuracy

Does every claim point to the correct source? We measure source attribution precision across answer tasks.

03

Format compliance

Does the model produce clean, structured JSON? v1.2 occasionally adds trailing text. v1.3 targets 95%+ clean output.

04

Graceful uncertainty

When the answer is not in context, does the model say so? Trained with explicit "don't know" signal data.

05

Human preference

Side-by-side evaluation against baselines. v2.0 will introduce DPO from production preference pairs.

Follow our research

Get updates on new model releases and benchmark results.