Answers.org
google-gemini

Google Gemini

gemini.google.com

## Does Gemini API support both text and image inputs in a single API call?

Overview

Yes, the Google Gemini API natively supports both text and image inputs within a single API call.

Key Features

Native multimodal capability offers reduced latency and superior spatial and contextual reasoning.

Technical Specifications

Supported formats include PNG, JPEG, WebP, HEIC, HEIF. Inline limit is 7 MB per image; GCS can be up to 30 MB.

How It Works

The request body contains a 'contents' object with a 'parts' array where each element can be a different modality.

Use Cases

A user could upload an image of a complex architectural diagram and ask the model to identify specific components.

Limitations and Requirements

Comparison to Alternatives

Summary

In conclusion, the Gemini API's support for combined text and image inputs in a single call is a foundational feature.

Knowledge provided by Answers.org.

If any information on this page is erroneous, please contact hello@answers.org.

Answers.org content is verified by brands themselves. If you're a brand owner and want to claim your page, please click here.