Pushing the frontier of autonomous reasoning.
Our research team publishes open papers, releases datasets, and ships the techniques back into the AgenticOcean platform within weeks — not years.

Recent Papers
Reflexive Tool-Use: Sub-quadratic planning for long-horizon agents
Feb 2026 · arXiv preprint
Sentinel: a policy-as-code firewall for LLM tool calls
Jan 2026 · USENIX Security (under review)
Cognition-Bench: a 412-task evaluation for enterprise agents
Dec 2025 · NeurIPS 2025 Workshop
Ocean-RAG: retrieval that beats fine-tuning on domain QA
Oct 2025 · EMNLP 2025
Benchmarks & Evals
Every release of AgenticOcean is scored against a public-facing eval suite. We publish the numbers — wins and regressions.
Safety & Alignment
We red-team every model and every agent template before it ships. Findings, mitigations and residual risks are published in our quarterly Safety Report.
- • Prompt-injection corpus (4.1M adversarial samples, open-source)
- • Tool-call sandboxing with capability tokens
- • Human-in-the-loop primitives in every SDK
- • Independent third-party SOC2 Type II audits
Quarterly Safety Report — Q1 2026
119 evaluated jailbreaks · 4 critical mitigations shipped · 0 unresolved high-severity issues.
Download PDF →Open Datasets
Cognition-Bench-Enterprise
Multi-step enterprise workflows across finance, HR and IT.
Ocean-RAG-1M
Domain-grounded retrieval QA across 14 industries.
InjectBench
Adversarial prompt-injection corpus for safety research.
Collaborate with us
We sponsor PhD residencies, fund external research grants and partner with university labs on long-horizon agentic systems.
research@agenticocean.app