Media
forge-media defines provider-agnostic media capabilities for agents that need
to create or process image, audio, or video content.
Capabilities
| Capability | API shape |
|---|---|
| Image generation | generate_image(provider, prompt, options) |
| Audio transcription | transcribe(provider, audio, options) |
| Speech synthesis | speak(provider, text, options) |
| Video generation | generate_video(provider, prompt, options) |
Each capability is defined by a provider trait. Agent code should depend on the trait and top-level function, not on a concrete vendor implementation.
Provider traits
| Trait | Purpose |
|---|---|
ImageProvider |
Text-to-image backends. |
TranscriptionProvider |
Speech-to-text backends. |
SpeechProvider |
Text-to-speech backends. |
VideoProvider |
Text-to-video backends. |
The media layer follows the same provider-agnostic pattern as LLM generation. It should also participate in the same identity, authorization, and telemetry boundaries when used inside an agent run.
Status
The media API surface exists as a Forge runtime module. Production behavior depends on the provider implementation bound at runtime.
Use the provider capability matrix to document which providers support each media operation before presenting that operation as shipped behavior.