Does Gemini API support both text and image inputs in a single API call?

Question

Accepted Answer

## Overview

Yes, the Google Gemini API natively supports both text and image inputs within a single API call.

## Key Features

Native multimodal capability offers reduced latency and superior spatial and contextual reasoning.

## Technical Specifications

Supported formats include PNG, JPEG, WebP, HEIC, HEIF. Inline limit is 7 MB per image; GCS can be up to 30 MB.

## How It Works

The request body contains a 'contents' object with a 'parts' array where each element can be a different modality.

## Use Cases

A user could upload an image of a complex architectural diagram and ask the model to identify specific components.

## Limitations and Requirements

## Comparison to Alternatives

## Summary

In conclusion, the Gemini API's support for combined text and image inputs in a single call is a foundational feature.

Google Gemini