Google Image Format¶
Official Documentation
📝 Introduction¶
Gemini provides powerful multimodal capabilities, supporting high-quality image generation from text prompts. This endpoint lets developers leverage Google's latest Gemini models for creative image generation.
| Model | Description |
|---|---|
| Gemini 3 Pro Image Preview | Google's most capable multimodal model, supports complex image generation tasks |
| Gemini 2.5 Flash Image | Lightweight model optimized for speed and efficiency, suitable for rapid generation |
💡 Request Examples¶
Create Image ✅¶
# Basic image generation
curl -X POST "https://www.vastnum.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
-H "Authorization: Bearer $WS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [
{
"role": "user",
"parts": [
{
"text": "draw a cat"
}
]
}
],
"generationConfig": {
"responseModalities": [
"TEXT",
"IMAGE"
],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "4K"
}
}
}'
Response example:
{
"candidates": [
{
"content": {
"role": "model",
"parts": [
{
"text": "Here is a drawing of a cat."
},
{
"inlineData": {
"mimeType": "image/png",
"data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
}
}
]
},
"finishReason": "STOP",
"safetyRatings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability": "NEGLIGIBLE"
}
]
}
],
"usageMetadata": {
"promptTokenCount": 10,
"candidatesTokenCount": 200,
"totalTokenCount": 210
}
}
📮 Request¶
Endpoint¶
Generate Content¶
Generates content (text and images) from a text prompt.
Authentication¶
Include the following header for API key authentication:
Where $WS_API_KEY is your API key.
Path Parameters¶
model¶
- Type: string
- Required: yes
- Description: Model name, e.g.
gemini-1.5-proorgemini-1.5-flash.
Request Body¶
contents¶
- Type: array of objects
- Required: yes
- Description: List of conversation contents.
role: role, typicallyuser.parts: list of content parts, containing atextfield.
generationConfig¶
- Type: object
- Required: no
- Description: Generation configuration.
responseModalities¶
- Type: array of strings
- Required: no
- Description: Expected response modalities. Include
IMAGEto enable image generation. - Example:
["TEXT", "IMAGE"]
imageConfig¶
- Type: object
- Required: no
- Description: Image generation configuration.
aspectRatio¶
- Type: string
- Required: no
- Description: Image aspect ratio.
- Options:
16:9,4:3,1:1, etc.
imageSize¶
- Type: string
- Required: no
- Description: Image size.
- Options:
4K,2K, etc.
📥 Response¶
Success Response¶
Returns a JSON object containing generated candidates.
candidates¶
- Type: array of objects
- Description: List of generated candidates.
content¶
- Type: object
- Description: Generated content.
parts: list of content parts. Image data is typically returned base64-encoded in theinlineDatafield.
finishReason¶
- Type: string
- Description: Reason generation ended, e.g.
STOP.
safetyRatings¶
- Type: array of objects
- Description: Safety rating information.
usageMetadata¶
- Type: object
- Description: Token usage metadata.
promptTokenCount: prompt token count.candidatesTokenCount: candidate token count.totalTokenCount: total token count.
🌟 Best Practices¶
Prompt Writing Tips¶
- Clear intent: Describe clearly what you want the image to contain.
- Detail: Include color, style, lighting, and other details for more precise results.
- Specify modalities: Make sure
IMAGEis included ingenerationConfig.
Configuration Tips¶
- Aspect ratio: Pick
aspectRatiobased on use case (e.g.,16:9for widescreen displays). - Resolution: Pick
imageSizebased on need; higher resolution consumes more resources.