Google Image Format¶

Official Documentation

📝 Introduction¶

Gemini provides powerful multimodal capabilities, supporting high-quality image generation from text prompts. This endpoint lets developers leverage Google's latest Gemini models for creative image generation.

Model	Description
Gemini 3 Pro Image Preview	Google's most capable multimodal model, supports complex image generation tasks
Gemini 2.5 Flash Image	Lightweight model optimized for speed and efficiency, suitable for rapid generation

💡 Request Examples¶

Create Image ✅¶

# Basic image generation
curl -X POST "https://www.vastnum.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "Authorization: Bearer $WS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "text": "draw a cat"
          }
        ]
      }
    ],
    "generationConfig": {
      "responseModalities": [
        "TEXT",
        "IMAGE"
      ],
      "imageConfig": {
        "aspectRatio": "16:9",
        "imageSize": "4K"
      }
    }
  }'

Response example:

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Here is a drawing of a cat."
          },
          {
            "inlineData": {
              "mimeType": "image/png",
              "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
            }
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 200,
    "totalTokenCount": 210
  }
}

📮 Request¶

Endpoint¶

Generate Content¶

POST /v1beta/models/{model}:generateContent

Generates content (text and images) from a text prompt.

Authentication¶

Include the following header for API key authentication:

Authorization: Bearer $WS_API_KEY

Where $WS_API_KEY is your API key.

Path Parameters¶

`model`¶

Type: string
Required: yes
Description: Model name, e.g. gemini-1.5-pro or gemini-1.5-flash.

Request Body¶

`contents`¶

Type: array of objects
Required: yes
Description: List of conversation contents.
role: role, typically user.
parts: list of content parts, containing a text field.

`generationConfig`¶

Type: object
Required: no
Description: Generation configuration.

`responseModalities`¶

Type: array of strings
Required: no
Description: Expected response modalities. Include IMAGE to enable image generation.
Example: ["TEXT", "IMAGE"]

`imageConfig`¶

Type: object
Required: no
Description: Image generation configuration.

`aspectRatio`¶

Type: string
Required: no
Description: Image aspect ratio.
Options: 16:9, 4:3, 1:1, etc.

`imageSize`¶

Type: string
Required: no
Description: Image size.
Options: 4K, 2K, etc.

📥 Response¶

Success Response¶

Returns a JSON object containing generated candidates.

`candidates`¶

Type: array of objects
Description: List of generated candidates.

`content`¶

Type: object
Description: Generated content.
parts: list of content parts. Image data is typically returned base64-encoded in the inlineData field.

`finishReason`¶

Type: string
Description: Reason generation ended, e.g. STOP.

`safetyRatings`¶

Type: array of objects
Description: Safety rating information.

`usageMetadata`¶

Type: object
Description: Token usage metadata.
promptTokenCount: prompt token count.
candidatesTokenCount: candidate token count.
totalTokenCount: total token count.

🌟 Best Practices¶

Prompt Writing Tips¶

Clear intent: Describe clearly what you want the image to contain.
Detail: Include color, style, lighting, and other details for more precise results.
Specify modalities: Make sure IMAGE is included in generationConfig.

Configuration Tips¶

Aspect ratio: Pick aspectRatio based on use case (e.g., 16:9 for widescreen displays).
Resolution: Pick imageSize based on need; higher resolution consumes more resources.