May 6, 2025
We’re rolling out Fuse, our new architecture for orchestrating multiple specialized agents in a single workflow.
With Fuse, you can now:
Chain agents with task-specific prompts
Enable parallel agent execution with shared memory
Call external tools from within agent trees
The beta includes support for function calling and shared context across agents.
Fuse is available for Pro and Enterprise plans.
📦 New: GPU Auto-Scaling (for Inference APIs)
Our API now supports automatic GPU scaling based on real-time traffic.
This helps reduce cold starts and ensures low-latency inference even during usage spikes.
Support added for NVIDIA A100, H100
Billing adjusts dynamically based on load
Requires no configuration — just deploy your model
🧠 Improved: Model Updates
Upgraded our default CodeGen-7B endpoint to v2.1 — better accuracy, fewer hallucinations
DocQA model now supports 150k token contexts
Improved multi-language support in Chat endpoint (added Korean, Dutch, Polish)
🔐 API Changes
New
/v1/agents/run
endpoint for orchestrated multi-agent flowsDeprecated
/v1/tasks/create
— use/v1/agents/launch
insteadAPI keys can now be scoped per model, feature, or environment (dev/staging/prod)
🧪 Labs
Internal tests running for speech-to-code pipeline (using Whisper + CodeT5)
Early access to fine-tuned vision transformer (ViT-x3) for document parsing
Testing memory-aware agents with local context retention beyond sessions
🛠 Fixes
Fixed a memory leak in real-time embeddings endpoint
Resolved an auth issue causing 401 errors on
PUT /models/train
Improved latency for European region (Frankfurt): -35ms avg per call