Budget research agent: Kimi K2.6 for context, Qwen for synthesis

Why two models

Long-context models are cheap per token but generate worse executive prose. Premium models cost 5-25x more per output token but write much better. Split the workload.

Step 1 — Kimi reads everything

Concatenate up to ~800K tokens of PDF text into Kimi K2.6's long context. Ask for structured key points (JSON).

extract.py

corpus = "\n\n--- PDF BOUNDARY ---\n\n".join(pdf_texts)

extract = client.chat.completions.create(
    model="kimi-k2.6",
    response_format={"type": "json_object"},
    messages=[
        {
            "role": "system",
            "content": (
                "Read all the PDFs separated by '--- PDF BOUNDARY ---'. "
                "Return JSON: {\"docs\": [{\"title\": ..., \"key_points\": [...]}]}"
            ),
        },
        {"role": "user", "content": corpus},
    ],
    max_tokens=4000,
)

Step 2 — Qwen synthesizes

Pass Kimi K2.6's JSON to Qwen3 Max. Ask for an executive summary in your house style.

synthesize.py

summary = client.chat.completions.create(
    model="qwen3-max",
    messages=[
        {
            "role": "system",
            "content": (
                "Write a one-page executive summary. Tone: direct, no jargon, "
                "no marketing language. Lead with the single most important "
                "finding. Cite sources by title."
            ),
        },
        {"role": "user", "content": extract.choices[0].message.content},
    ],
    max_tokens=1200,
)
print(summary.choices[0].message.content)

Cost

50 PDFs × 16K tokens = 800K input. Kimi K2.6 handles the extraction step, then Qwen3 Max writes the final brief. Check /pricing for the latest per-model price before running large batches.

Next steps