Skip to content

Google Image Format

Official Documentation

Gemini API

📝 Introduction

Gemini provides powerful multimodal capabilities, supporting high-quality image generation from text prompts. This endpoint lets developers leverage Google's latest Gemini models for creative image generation.

Model Description
Gemini 3 Pro Image Preview Google's most capable multimodal model, supports complex image generation tasks
Gemini 2.5 Flash Image Lightweight model optimized for speed and efficiency, suitable for rapid generation

💡 Request Examples

Create Image ✅

# Basic image generation
curl -X POST "https://www.vastnum.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "Authorization: Bearer $WS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "text": "draw a cat"
          }
        ]
      }
    ],
    "generationConfig": {
      "responseModalities": [
        "TEXT",
        "IMAGE"
      ],
      "imageConfig": {
        "aspectRatio": "16:9",
        "imageSize": "4K"
      }
    }
  }'

Response example:

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "Here is a drawing of a cat."
          },
          {
            "inlineData": {
              "mimeType": "image/png",
              "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
            }
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 10,
    "candidatesTokenCount": 200,
    "totalTokenCount": 210
  }
}

📮 Request

Endpoint

Generate Content

POST /v1beta/models/{model}:generateContent

Generates content (text and images) from a text prompt.

Authentication

Include the following header for API key authentication:

Authorization: Bearer $WS_API_KEY

Where $WS_API_KEY is your API key.

Path Parameters

model

  • Type: string
  • Required: yes
  • Description: Model name, e.g. gemini-1.5-pro or gemini-1.5-flash.

Request Body

contents

  • Type: array of objects
  • Required: yes
  • Description: List of conversation contents.
  • role: role, typically user.
  • parts: list of content parts, containing a text field.

generationConfig

  • Type: object
  • Required: no
  • Description: Generation configuration.
responseModalities
  • Type: array of strings
  • Required: no
  • Description: Expected response modalities. Include IMAGE to enable image generation.
  • Example: ["TEXT", "IMAGE"]
imageConfig
  • Type: object
  • Required: no
  • Description: Image generation configuration.
aspectRatio
  • Type: string
  • Required: no
  • Description: Image aspect ratio.
  • Options: 16:9, 4:3, 1:1, etc.
imageSize
  • Type: string
  • Required: no
  • Description: Image size.
  • Options: 4K, 2K, etc.

📥 Response

Success Response

Returns a JSON object containing generated candidates.

candidates

  • Type: array of objects
  • Description: List of generated candidates.
content
  • Type: object
  • Description: Generated content.
  • parts: list of content parts. Image data is typically returned base64-encoded in the inlineData field.
finishReason
  • Type: string
  • Description: Reason generation ended, e.g. STOP.
safetyRatings
  • Type: array of objects
  • Description: Safety rating information.

usageMetadata

  • Type: object
  • Description: Token usage metadata.
  • promptTokenCount: prompt token count.
  • candidatesTokenCount: candidate token count.
  • totalTokenCount: total token count.

🌟 Best Practices

Prompt Writing Tips

  1. Clear intent: Describe clearly what you want the image to contain.
  2. Detail: Include color, style, lighting, and other details for more precise results.
  3. Specify modalities: Make sure IMAGE is included in generationConfig.

Configuration Tips

  1. Aspect ratio: Pick aspectRatio based on use case (e.g., 16:9 for widescreen displays).
  2. Resolution: Pick imageSize based on need; higher resolution consumes more resources.