Documentation Index
Fetch the complete documentation index at: https://docs.samuraiapi.in/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Vision-capable models can analyze images alongside text. Pass image URLs or base64-encoded images in the content field of your messages.
Supported Models
| Model | Provider | Notes |
|---|
gpt-4o | OpenAI | Best overall vision |
gpt-4o-mini | OpenAI | Fast and affordable |
claude-3-5-sonnet-20241022 | Anthropic | Excellent document understanding |
gemini-2.0-flash | Google | Fast, 1M context |
gemini-2.5-pro | Google | Best for complex visual reasoning |
llava-1.6 | Open Source | Open-weight vision model |
Using Image URLs
from openai import OpenAI
client = OpenAI(
api_key="sk-samurai-YOUR_KEY",
base_url="https://api.samuraiapi.in/v1"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Cat03.jpg/1200px-Cat03.jpg"}
},
{
"type": "text",
"text": "What is in this image? Describe it in detail."
}
]
}
]
)
print(response.choices[0].message.content)
Using Base64 Images
import base64
with open("screenshot.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{image_data}"
}
},
{"type": "text", "text": "Extract all text from this screenshot."}
]
}
]
)
Multiple Images
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/before.jpg"}},
{"type": "image_url", "image_url": {"url": "https://example.com/after.jpg"}},
{"type": "text", "text": "What changed between these two images?"}
]
}
]
)
PNG, JPEG, GIF, WebP — max 20MB per image.