SeaLinkSeaLink
/
Cookbook

6 min read

Cost-effective research agent: Kimi for context, Qwen for synthesis

Read 50 PDFs with Kimi K2 ($0.6/1M in), distill key points, then have Qwen3 Max write the executive summary.

Why two models

Long-context models are cheap per token but generate worse executive prose. Premium models cost 5-25x more per output token but write much better. Split the workload.

Step 1 — Kimi reads everything

Concatenate up to ~800K tokens of PDF text into Kimi's 1M-token context. Ask for structured key points (JSON).

extract.py
corpus = "\n\n--- PDF BOUNDARY ---\n\n".join(pdf_texts)
extract = client.chat.completions.create(
model="kimi-k2",
response_format={"type": "json_object"},
messages=[
{
"role": "system",
"content": (
"Read all the PDFs separated by '--- PDF BOUNDARY ---'. "
"Return JSON: {\"docs\": [{\"title\": ..., \"key_points\": [...]}]}"
),
},
{"role": "user", "content": corpus},
],
max_tokens=4000,
)

Step 2 — Qwen synthesizes

Pass Kimi's JSON to Qwen3 Max. Ask for an executive summary in your house style.

synthesize.py
summary = client.chat.completions.create(
model="qwen3-max",
messages=[
{
"role": "system",
"content": (
"Write a one-page executive summary. Tone: direct, no jargon, "
"no marketing language. Lead with the single most important "
"finding. Cite sources by title."
),
},
{"role": "user", "content": extract.choices[0].message.content},
],
max_tokens=1200,
)
print(summary.choices[0].message.content)

Cost

50 PDFs × 16K tokens = 800K input. Kimi K2 input: $0.48. Kimi output (4K): $0.01. Qwen3 Max input (4K): about $0.01. Qwen3 Max output (1.2K): about $0.02. Total ≈ $0.52 / report. Run 100/month for about $52.