In AI application development, choosing the right LLM API service is crucial. Whether you are building an intelligent dialogue system, developing an AI Agent, or participating in an AI Hackathon, this article will provide you with a comprehensive API usage guide, covering mainstream services such as OpenRouter, Anthropic API, Volcano Engine, and Siliconflow.

Why Do You Need Multiple API Services?

Different LLM models have their own advantages, especially when developing AI Agents, where you need to choose the right model based on specific scenarios:

  • Claude (Anthropic): Excels in complex reasoning, programming, and Agent tasks, particularly suitable for scenarios requiring deep thinking
  • Gemini (Google): Performs well in long text processing and multimodal understanding, suitable for handling multimedia content such as images and videos
  • GPT (OpenAI): Strong in image understanding and mathematical reasoning, excellent for everyday conversation experiences
  • Doubao (ByteDance): Fast access speed in China, good voice dialogue experience, especially suitable for real-time interaction scenarios
  • Open Source Models: Low cost, highly customizable, suitable for large-scale deployment

OpenRouter: One-Stop Access to All Models

OpenRouter is the service I most recommend. It provides a unified API interface to access various models without worrying about regional restrictions. For AI Agent developers, this means you can easily switch and combine different models.

Advantages

  1. No Regional Restrictions: Access various models directly from within China
  2. Unified Interface: Uses OpenAI format API, simplifying programming
  3. Rich Models: Supports mainstream models like Claude, Gemini, GPT, Grok, etc.
  4. Convenient for Agent Development: Flexibly call models with different capabilities within one system

Usage Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
from openai import OpenAI

client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="<OPENROUTER_API_KEY>",
)

# 使用 Claude Sonnet 进行复杂推理(适合 Agent 的深度思考)
completion = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[
{
"role": "user",
"content": "请解释什么是递归算法,并给出一个 Python 示例"
}
]
)
print(completion.choices[0].message.content)

# 多模态示例:图片理解(适合需要视觉能力的 Agent)
completion = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
]
)
print(completion.choices[0].message.content)

Model Selection Strategy for AI Agent Development

When developing AI Agents, different tasks require different models:

  • Programming and Core Logic of Agent (Tool Calling): anthropic/claude-sonnet-4
  • Long Text Analysis, Report Generation, Multimodal: google/gemini-2.5-pro
  • Real-Time Response (Fast Thinking): google/gemini-2.5-flash
  • Everyday Conversation: openai/gpt-4o

Anthropic Official API

Although OpenRouter is convenient, there are scenarios where you still need to use the official API, such as using Claude Code or Computer Use features.

Notes

⚠️ Important: Do not use the Anthropic API key with domestic and Hong Kong IPs; you need to access it using overseas IPs.

Usage Scenarios

  • Claude Code
  • Computer Use (Allowing Agents to Operate Computers)

⚠️ Special Feature Tip: Tool Use and Thinking Mode in Claude for Agent development require special syntax. Please refer to the Anthropic Official Documentation for the latest usage methods.

Volcano Engine: Doubao Model

The Doubao model provided by Volcano Engine has low latency for access within China, making it particularly suitable for real-time Agent applications that require fast response times.

Model Selection for AI Agent Development

Volcano Engine provides three Doubao models with different capabilities, suitable for different AI Agent scenarios:

1. Fast and Slow Thinking Architecture

When developing an Agent with a “Fast and Slow Thinking” architecture:

Fast Thinking Model (Low Latency) - doubao-seed-1-6-flash-250615:

1
2
3
4
5
6
7
8
9
10
response = client.chat.completions.create(
model="doubao-seed-1-6-flash-250615",
messages=[
{
"role": "user",
"content": "今天天气怎么样?"
}
],
stream=True # 流式输出,减少延迟
)

Slow Thinking Model (Deep Reasoning) - doubao-seed-1-6-thinking-250615:

1
2
3
4
5
6
7
8
9
10
response = client.chat.completions.create(
model="doubao-seed-1-6-thinking-250615",
messages=[
{
"role": "user",
"content": "请详细解释相对论的基本原理"
}
],
stream=True # 流式输出中间思考过程
)

2. Multimodal Agent

Suitable for Agents that need to process images - doubao-seed-1-6-250615:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import os
from openai import OpenAI

client = OpenAI(
base_url="https://ark.cn-beijing.volces.com/api/v3",
api_key=os.environ.get("ARK_API_KEY"),
)

response = client.chat.completions.create(
model="doubao-seed-1-6-250615",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://ark-project.tos-cn-beijing.ivolces.com/images/view.jpeg"
},
},
{"type": "text", "text": "这是哪里?"},
],
}
],
)

print(response.choices[0])

Siliconflow: The Best Choice for Open Source Models

Siliconflow provides a complete model ecosystem for AI Agent development, including LLM, TTS (Text-to-Speech), and ASR (Automatic Speech Recognition).

LLM Models

Recommended models for Agent development:

  • Kimi K2 Instruct: moonshotai/Kimi-K2-Instruct
  • DeepSeek R1 0528: deepseek-ai/DeepSeek-R1
  • Qwen3 235B: Qwen/Qwen3-235B-A22B-Instruct-2507

⚠️ Important Tip: It is not recommended to use DeepSeek for Tool Calling, as DeepSeek’s capabilities in this area are relatively weak. If you need Tool Calling functionality, it is recommended to choose Claude, Gemini series models, or OpenAI o3/GPT-4.1, Grok-4. If you can only use domestic models, it is recommended to use Kimi K2.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import requests

url = "https://api.siliconflow.cn/v1/chat/completions"

payload = {
"model": "moonshotai/Kimi-K2-Instruct",
"messages": [
{
"role": "user",
"content": "What opportunities and challenges will the Chinese large model industry face in 2025?"
}
],
"stream": True # Agent 开发推荐使用流式输出
}
headers = {
"Authorization": "Bearer <SILICONFLOW_API_KEY>",
"Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers, stream=True)
# 处理流式响应...

Voice Agent Development Kit

TTS Text-to-Speech

Use Fish Audio or CosyVoice for speech synthesis:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import requests
import os

url = "https://api.siliconflow.cn/v1/audio/speech"

payload = {
"model": "fishaudio/fish-speech-1.5",
"input": "Nice to meet you!",
"voice": "fishaudio/fish-speech-1.5:alex",
"response_format": "mp3",
"speed": 1.0
}
headers = {
"Authorization": "Bearer " + os.environ['SILICONFLOW_API_KEY'],
"Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

# 保存音频文件
with open("output.mp3", "wb") as f:
f.write(response.content)

ASR Speech Recognition

Use SenseVoice for speech recognition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from openai import OpenAI
import os

client = OpenAI(
base_url="https://api.siliconflow.cn/v1",
api_key=os.environ['SILICONFLOW_API_KEY'],
)

def transcribe_audio(speech_file_path):
with open(speech_file_path, "rb") as audio_file:
transcription = client.audio.transcriptions.create(
model="FunAudioLLM/SenseVoiceSmall",
file=audio_file,
language="zh", # 指定语言可以提高准确率
prompt="这是一段对话录音" # 提供上下文
)
return transcription.text

# 在实时语音 Agent 中的使用示例
def process_audio_chunk(audio_chunk):
# 1. 使用 VAD 检测语音结束(推荐 Silero VAD)
# 2. 调用 ASR 识别
text = transcribe_audio(audio_chunk)
# 3. 传递给 LLM 处理
return text

Using AI to Assist in Developing Agents

When developing AI Agents, it is recommended to use AI-assisted programming tools like Cursor to practice “using Agents to develop Agents”:

  1. Documentation First: Let AI generate design documents first, iterate and optimize before coding
  2. Choose the Right Model: Use models with thinking capabilities (e.g., Claude 4 Sonnet)
  3. Test-Driven: Let AI write test cases for the code

⚠️ Important Reminder: LLM-generated model names are often outdated, and usage is frequently incorrect because LLM’s training data is relatively old, while model progress is very fast. Therefore, when writing LLM-related code, be sure to:

  • Have Cursor write code according to the latest official documentation
  • Explicitly specify the model type and version in the prompt
  • Do not attempt to let Cursor randomly generate model calling code
  • Actively provide the latest API documentation links for the AI assistant to reference

AI Code Editor Selection

In addition to AI code editors like Cursor, Windsurf, and Trae that require payment, you can also use the OpenRouter API key to get a similar experience in other editors:

  • Void AI Editor: Open source, supports direct use of OpenRouter API key, similar functionality to Cursor
  • VSCode Cline Plugin: Can configure OpenRouter API key to achieve AI-assisted programming in VSCode

The advantage of using OpenRouter is that you can switch between different models under the same API key to find the AI assistant that best suits your programming style.

AI Agent Debugging Tips

Debugging is one of the most important stages in AI Agent development. Here are some practical debugging suggestions:

Understanding How LLM Works

Think of LLM as a person, one without any background tasks but smart and with a lot of general knowledge (similar to a freshly graduated student from Tsinghua’s Yao Class). At this point, you must provide clear instructions and complete context.

Thoroughly Check Input Content

During the debugging phase, it is best to thoroughly review the context sent to LLM to ensure:

  1. Structured Content Format is Correct: JSON, XML, and other structured data are correctly formatted
  2. System Prompt is Correct: Instructions are clear and unambiguous
  3. Complete History: The execution history of the Agent includes:
    • What the user said
    • LLM’s internal thought process
    • Replies to the user
    • Tool Call and Tool Call Result
  4. Correct Order: All interaction records are arranged in chronological order, with no omissions or disorder

Debugging Best Practices

  • Step-by-Step Verification: Start with simple scenarios and gradually increase complexity
  • Detailed Logs: Record complete input and output for problem localization
  • Model Comparison: Test on different models to find the most suitable model combination

Summary

Choosing the right LLM API service is key to building an excellent AI Agent:

  1. Regional Restrictions: Prioritize using OpenRouter to avoid access issues
  2. Task Requirements: Choose the model that excels based on the specific functions of the Agent
  3. Latency Requirements: Choose low-latency models for real-time interaction and powerful models for deep thinking
  4. Cost Considerations: Balance performance and cost, and use open-source models wisely
  5. Architecture Design: Adopt a hybrid model architecture to leverage the advantages of different models

Quick Reference: Common Model List

Purpose Recommended Model Features
Programming/Agent Development anthropic/claude-sonnet-4 Strong coding ability, stable tool calling
Low Latency Response google/gemini-2.5-flash Extremely low latency, suitable for real-time interaction
Long Text Processing google/gemini-2.5-pro Large context window, strong understanding ability
Document Writing google/gemini-2.5-pro Deep thinking, concise and fluent language
Balanced Performance openai/gpt-4o Balanced capabilities in all aspects
Cost Optimization google/gemini-2.5-flash High cost-performance ratio
Low-Cost Agent Development moonshotai/Kimi-K2-Instruct Open-source model, low cost, good effect
Low Latency Response in China doubao-seed-1-6-flash-250615 Extremely low latency in China, suitable for real-time interaction
Chinese Creative Writing deepseek-ai/DeepSeek-R1 Strong Chinese expression ability
Speech Recognition FunAudioLLM/SenseVoiceSmall Low latency, low cost
Speech Synthesis fishaudio/fish-speech-1.5 Low latency, low cost

By using these API services in a reasonable combination, you can build a powerful and responsive AI Agent. Remember, an excellent Agent does not rely on a single model but knows how to use the right tool in the right scenario.

Wishing you success in your AI Agent development journey!

Comments