Media

forge-media defines provider-agnostic media capabilities for agents that need to create or process image, audio, or video content.

Capabilities

Capability	API shape
Image generation	`generate_image(provider, prompt, options)`
Audio transcription	`transcribe(provider, audio, options)`
Speech synthesis	`speak(provider, text, options)`
Video generation	`generate_video(provider, prompt, options)`

Each capability is defined by a provider trait. Agent code should depend on the trait and top-level function, not on a concrete vendor implementation.

Provider traits

Trait	Purpose
`ImageProvider`	Text-to-image backends.
`TranscriptionProvider`	Speech-to-text backends.
`SpeechProvider`	Text-to-speech backends.
`VideoProvider`	Text-to-video backends.

The media layer follows the same provider-agnostic pattern as LLM generation. It should also participate in the same identity, authorization, and telemetry boundaries when used inside an agent run.

Status

The media API surface exists as a Forge runtime module. Production behavior depends on the provider implementation bound at runtime.

Use the provider capability matrix to document which providers support each media operation before presenting that operation as shipped behavior.