Building intelligence for organizational memory
We develop specialized AI models trained to understand how organizations create, lose, and recover knowledge. Our research focuses on memory extraction, relationship detection, and temporal reasoning across unstructured organizational data.
Memory is sensitive. It contains decisions, relationships, and institutional context that should never leave an organization's control. That belief shapes everything we build. Rabbit's architecture, training pipeline, and inference stack are developed entirely in-house. We do not depend on external model providers for any part of the intelligence layer. When an organization deploys Rabbit, they deploy technology we built, on infrastructure they control.
All training data is ethically sourced. We use synthetic data generation to create diverse, representative training sets without exposing any real organizational data.
Model evolution
Every training run documented. We believe in transparent research.
Initial release v1.0
Conversational quality v1.1
Relationship intelligence v1.2 Current
Knowledge compilation v1.3
Evaluation on real organizational data
All benchmarks evaluated against meeting transcripts, email threads, Slack conversations, and project documents.
| Signal | Description | Accuracy | Score | Latency |
|---|---|---|---|---|
| Intent Classification | Route queries to the right strategy | 97% | 270ms | |
| Memory Triage | Auto-classify and summarize | 94% | 2.1s | |
| Entity Extraction | People, orgs, decisions, actions | 92% | 1.8s | |
| Sentiment Analysis | Emotional tone detection | 91% | 350ms | |
| Relationship Linking | Cross-memory connections | 89% | 1.5s | |
| Query Expansion | Enrich vague queries | 87% | 700ms | |
| Answer Generation | Cited conversational answers | 85% | 4.2s |
How we source and handle training data
We take data ethics seriously. No real user data is used for training.
No real user data
Trained entirely on synthetic data and ethically sourced public datasets. No customer data, no scraped content.
Synthetic generation
Seed-and-expand methodology: 100 hand-crafted examples per signal expanded to 10,000+ using controlled generation, then quality-filtered.
Continuous evaluation
Every version evaluated against held-out test sets with human review. All benchmarks and training parameters published transparently.
What we use
The tools and services powering Rabbit's training, evaluation, and deployment.
Google Cloud Platform
Production inference and model serving. Supported by Google Cloud credits for startups.
RunPod
GPU compute for training. A100 instances with Unsloth for memory-efficient LoRA training.
Hugging Face
Model hosting and distribution. All versions published for reproducibility.
FastEmbed
Local embedding model (BGE-Base-EN-v1.5) for vector search. On-device, zero external calls.
What we measure and why
Every capability is evaluated against specific quality criteria across model versions.
Faithfulness
Does the model only state facts present in the source? v1.2 scores 70%. v1.3 targets 95%+.
Citation accuracy
Does every claim point to the correct source? We measure source attribution precision across answer tasks.
Format compliance
Does the model produce clean, structured JSON? v1.2 occasionally adds trailing text. v1.3 targets 95%+ clean output.
Graceful uncertainty
When the answer is not in context, does the model say so? Trained with explicit "don't know" signal data.
Human preference
Side-by-side evaluation against baselines. v2.0 will introduce DPO from production preference pairs.