Media

forge-media defines provider-agnostic media capabilities for agents that need to create or process image, audio, or video content.

Capabilities

Capability API shape
Image generation generate_image(provider, prompt, options)
Audio transcription transcribe(provider, audio, options)
Speech synthesis speak(provider, text, options)
Video generation generate_video(provider, prompt, options)

Each capability is defined by a provider trait. Agent code should depend on the trait and top-level function, not on a concrete vendor implementation.

Provider traits

Trait Purpose
ImageProvider Text-to-image backends.
TranscriptionProvider Speech-to-text backends.
SpeechProvider Text-to-speech backends.
VideoProvider Text-to-video backends.

The media layer follows the same provider-agnostic pattern as LLM generation. It should also participate in the same identity, authorization, and telemetry boundaries when used inside an agent run.

Status

The media API surface exists as a Forge runtime module. Production behavior depends on the provider implementation bound at runtime.

Use the provider capability matrix to document which providers support each media operation before presenting that operation as shipped behavior.