Generation

stream_text, stream_text_chunks, stream_object, generate_object — text and structured generation.

Generation

forge-generate provides higher-level helpers built on top of LanguageModel::stream_chunks. Use these when you need text or structured output without the full agent tool loop.

Plain text — buffered

use forge::generate::stream_text;

let result = stream_text(&model, "Write a haiku about the forge.", &options).await?;
println!("{}", result.text);
println!("[{} tokens]", result.usage.completion_tokens);

stream_text consumes the full chunk stream and returns a TextStreamResult with the assembled text + usage. Internally it drives stream_chunks with no buffering inside the model.

Plain text — streamed

use forge::generate::stream_text_chunks;
use futures_util::StreamExt;

let mut stream = stream_text_chunks(&model, "Write a haiku.", &options).await?;
while let Some(delta) = stream.next().await {
    print!("{}", delta?);
}

Yields String deltas as they arrive — the lightest path to per-token text output without writing your own match on StreamChunk.

Structured output

For typed outputs, use generate_object:

use forge::generate::generate_object;
use serde::Deserialize;

#[derive(Deserialize, Debug)]
struct WeatherSummary {
    city: String,
    temperature_f: f64,
    conditions: String,
    confidence: f64,
}

let summary: WeatherSummary = generate_object(
    &model,
    "Summarise the weather in Mountain View.",
    schemars::schema_for!(WeatherSummary),
    &options,
).await?;

generate_object adds the JSON schema to the request, asks the model for JSON output, parses, and returns the typed value. Failures are returned as ForgeError::SchemaValidation.

For streamed structured output, use stream_object (yields partial values as the JSON is assembled).

GenerateOptions

let options = GenerateOptions::default()
    .with_temperature(0.2)
    .with_max_tokens(1024)
    .with_stop_sequences(vec!["END".into()])
    .with_system_prompt("Respond strictly in valid JSON.")
    .with_metadata("trace_id", trace_id);

Available knobs:

Method Purpose
with_temperature(f32) Sampling temperature
with_max_tokens(u32) Output cap
with_top_p(f32) Nucleus sampling
with_stop_sequences(Vec<String>) Halt on these strings
with_system_prompt(String) One-off system prompt
with_metadata(key, value) Forwarded as observability metadata

Tool-aware generation

If you want the model to call tools but don't need the full agent runtime (no observers, no tool-loop bound, no record collection), use stream_chunks directly with a tools argument and execute them yourself:

use futures_util::StreamExt;

let mut stream = model
    .stream_chunks(&messages, &tool_defs, &options)
    .await?;

while let Some(item) = stream.next().await {
    match item? {
        StreamChunk::TextDelta { text } => print!("{text}"),
        StreamChunk::ToolCallEnd { id } => {
            // Look up the buffered tool call, execute, append result, continue
        }
        StreamChunk::Done { .. } => break,
        _ => {}
    }
}

For real production use, just build a StreamingToolLoopAgentAgents.

Next