Deep-Tech AI Research

Open standards. Novel architectures.
Edge deployment.

We build foundational AI technology — open standards for evidence integrity, grammar-first tokenizers for complex languages, and absence-detection architectures for safety-critical systems. Published on arXiv. Built from first principles.

EIF
Evidence Integrity Framework
Open standard for evaluating photographic evidence. 83 metrics across 4 fraud categories. Apache 2.0.
Learn more →
VerChol
Grammar-First Tokenization
Tokenizer for agglutinative languages. 3.1% fertility improvement over BPE. Published on arXiv.
Explore →
SenseAi
Absence-Detection Safety
Four-state architecture detecting what SHOULD be present but ISN'T. Autism wearable as primary application.
Discover →
Open Standard

Can you trust a
photograph anymore?

AI generates convincing fake photos in seconds. EIF is an open standard that evaluates whether a photograph constitutes reliable evidence for a specific claim — not just "is this real?" but "is this trustworthy?"

EIF Analysis Pipeline 9 probe categories active
photograph.jpg
JPEG Quantization0.92
Noise Consistency0.87
Lighting Physics0.78
Metadata Integrity0.95
Sensor Signature0.84
GAN Fingerprint0.96
Edge Frequency0.81
Color Distribution0.89
Compression Chain0.91
83
Forensic Metrics
9
Probe Categories
4
Target Domains
Apache 2.0
License

C2PA tracks provenance — who created it, what device, what edits. Deepfake detectors classify real or fake. Neither evaluates whether a photo is reliable evidence for a specific claim.

EIF analyses the photograph itself: compression artifacts, noise patterns, lighting physics, and statistical signatures that separate real sensors from neural networks.

A standard for truth cannot itself be opaque. EIF is open.

Identity Documents Vehicle Damage Property Inspection Product Authenticity

Patent pending — EIF multi-dimensional metric framework for domain-specific evidence integrity evaluation.

arXiv Published

Tokenizers weren't built
for these languages.

BPE shatters agglutinative words into meaningless bytes. VerChol's grammar-first approach decomposes words at morpheme boundaries — preserving grammatical meaning for 500M+ speakers globally.

VerChol — Grammar-First Tokenizer Morpheme-level decomposition
Tamil · பொறுப்பாளர்களுக்கு
"for the responsible persons" — one word, clause-level meaning
BPE (Standard)
பொறுப்பாள
ர்களுக்கு
6 tokens · no meaning preserved
VerChol
பொறுப்புஆள
அர்கள்உக்கு
4 tokens · each morpheme meaningful
Kannada · ಅಭಿವೃದ್ಧಿಹೊಂದುತ್ತಿರುವವರಿಗೆ
"for those who are developing" — single agglutinated word
BPE (Standard)
ಅಭಿವೃದ್ಧ
ಿಹೊಂದುತ್ತ
ಿರುವವರಿಗೆ
9 tokens · grammar destroyed
VerChol
ಅಭಿವೃದ್ಧಿ
ಹೊಂದುತ್ತಿರು
ವವರಇಗೆ
5 tokens · grammar preserved

Existing tokenizers — SentencePiece, BPE, WordPiece — were designed for isolating languages like English. They systematically fail on languages where a single word carries clause-level meaning through grammatical suffixing.

VerChol's grammar-first approach achieves a 3.1% fertility improvement over BPE on agglutinative language benchmarks — by decomposing words into grammatically meaningful morphemes instead of statistically frequent byte-pairs. Published on arXiv.

BharatMini — Low-Cost Domain Training

Alongside the tokenizer, we demonstrated narrow-domain model training at ₹2,700 — proving domain-specific AI for manufacturing and robotics doesn't require massive compute.

Tamil Kannada Turkish Finnish Korean Hungarian
Novel Architecture

What's missing is
the signal.

Every AI safety framework detects what IS present. SenseAi inverts this: it detects what SHOULD be present but ISN'T. A fundamentally different computational problem.

The architecture uses a four-state processing model that classifies signal streams by the absence of expected patterns. In autism: the absence of expected physiological variability predicts crisis. In manufacturing: a missing sensor reading means failure.

Applications

Autism Meltdown Prediction
On-device wearable detecting absence of expected physiological patterns. nRF52840 + 5 sensors. ₹2,999 with BPL subsidy.
Safety-Critical Monitoring
Industrial systems where missing signals indicate failure — sensor arrays, pipeline monitoring, structural health.
Surveillance Gap Detection
Identifying what camera networks are NOT covering — blind zones, degraded sensors, time windows.
Behavioural Analysis
Detecting omitted disclosures, missing responses — financial compliance, workplace safety, child protection.
SenseAi — Four-State Signal Model
Normal
Expected variability present — all sensors reporting
Reduced
Variability decreasing — pattern flattening detected
Absent
Expected signal missing — predictive window open
Crisis
Absence confirmed — intervention recommended
Patent in preparation — absence-detection signal processing architecture
AI-Native Operating System

Intelligence as a
file operation.

Any program, in any language, that can read and write files can now use AI. No SDKs. No API keys. No cloud dependency.

tharai — /dev/ai
# Listen and transcribe
$ cat audio.wav > /dev/ai/hear
→ "Turn off the lights in the living room"
 
# Understand an image
$ cat photo.jpg > /dev/ai/see
→ "A chest X-ray showing mild pleural effusion"
 
# Speak
$ echo "Report complete" > /dev/ai/speak
→ [audio plays through system speaker]
/hear
Voice & Audio

Speech-to-text, audio classification, sound event detection. Write audio data to a file — get intelligence back.

/dev/ai/hear
/see
Vision

Image classification, object detection, document understanding. Any camera, any image, any format.

/dev/ai/see
/think
Language & Reasoning

Summarization, Q&A, translation, analysis. Local models by default. Cloud when you choose.

/dev/ai/think
Founder
P
Prabhu Raja
Founder & Solo Technical Architect

School dropout. 20+ years in technology. 5+ years in textiles and manufacturing. Worked across finance, trading, insurance, aviation, and ecommerce domains. Solo technical founder — all architecture designed and built personally. Published researcher on arXiv. 1 patent filed, 3 in preparation. Based in Bengaluru, India.

Research & Publications

Published. Open. Cited.

We build from first principles and publish our research. Open standards, open code, open papers.

Standard
Evidence Integrity Framework
Open standard for forensic image verification. 83 metrics, 9 probe categories, Apache 2.0. Patent pending.
eif-format.org →
Paper
VerChol — Grammar-First Tokenization
3.1% fertility improvement over BPE for agglutinative languages. Grammar-first morpheme decomposition. Published on arXiv.
arXiv →
Arch
SenseAi — Absence-Detection Architecture
Four-state signal processing for safety-critical systems. Patent in preparation.
Patent pending
OS
TharAI — AI-Native Operating System
POSIX /dev/ai primitives for intelligence. Local-first. Model-agnostic. Open-core.
tharai.dev →