stream_text, stream_text_chunks, stream_object, generate_object — text and structured generation.

Generation

forge-generate provides higher-level helpers built on top of LanguageModel::stream_chunks. Use these when you need text or structured output without the full agent tool loop.

Plain text — buffered

use forge::generate::stream_text;

let result = stream_text(&model, "Write a haiku about the forge.", &options).await?;
println!("{}", result.text);
println!("[{} tokens]", result.usage.completion_tokens);

stream_text consumes the full chunk stream and returns a TextStreamResult with the assembled text + usage. Internally it drives stream_chunks with no buffering inside the model.

Plain text — streamed

use forge::generate::stream_text_chunks;
use futures_util::StreamExt;

let mut stream = stream_text_chunks(&model, "Write a haiku.", &options).await?;
while let Some(delta) = stream.next().await {
    print!("{}", delta?);
}

Yields String deltas as they arrive — the lightest path to per-token text output without writing your own match on StreamChunk.

Structured output

For typed outputs, use generate_object:

use forge::generate::generate_object;
use serde::Deserialize;

#[derive(Deserialize, Debug)]
struct WeatherSummary {
    city: String,
    temperature_f: f64,
    conditions: String,
    confidence: f64,
}

let summary: WeatherSummary = generate_object(
    &model,
    "Summarise the weather in Mountain View.",
    schemars::schema_for!(WeatherSummary),
    &options,
).await?;

generate_object adds the JSON schema to the request, asks the model for JSON output, parses, and returns the typed value. Failures are returned as ForgeError::SchemaValidation.

For streamed structured output, use stream_object (yields partial values as the JSON is assembled).

`GenerateOptions`

let options = GenerateOptions::default()
    .with_temperature(0.2)
    .with_max_tokens(1024)
    .with_stop_sequences(vec!["END".into()])
    .with_system_prompt("Respond strictly in valid JSON.")
    .with_metadata("trace_id", trace_id);

Available knobs:

Method	Purpose
`with_temperature(f32)`	Sampling temperature
`with_max_tokens(u32)`	Output cap
`with_top_p(f32)`	Nucleus sampling
`with_stop_sequences(Vec<String>)`	Halt on these strings
`with_system_prompt(String)`	One-off system prompt
`with_metadata(key, value)`	Forwarded as observability metadata

Tool-aware generation

If you want the model to call tools but don't need the full agent runtime (no observers, no tool-loop bound, no record collection), use stream_chunks directly with a tools argument and execute them yourself:

use futures_util::StreamExt;

let mut stream = model
    .stream_chunks(&messages, &tool_defs, &options)
    .await?;

while let Some(item) = stream.next().await {
    match item? {
        StreamChunk::TextDelta { text } => print!("{text}"),
        StreamChunk::ToolCallEnd { id } => {
            // Look up the buffered tool call, execute, append result, continue
        }
        StreamChunk::Done { .. } => break,
        _ => {}
    }
}

For real production use, just build a StreamingToolLoopAgent — Agents.

Streaming — the underlying API
Agents — the full tool loop runtime

Generation

Generation

Plain text — buffered

Plain text — streamed

Structured output

GenerateOptions

Tool-aware generation

Next

`GenerateOptions`