## Overview Yes, the Google Gemini API natively supports both text and image inputs within a single API call. ## Key Features Native multimodal capability offers reduced latency and superior spatial and contextual reasoning. ## Technical Specifications Supported formats include PNG, JPEG, WebP, HEIC, HEIF. Inline limit is 7 MB per image; GCS can be up to 30 MB. ## How It Works The request body contains a 'contents' object with a 'parts' array where each element can be a different modality. ## Use Cases A user could upload an image of a complex architectural diagram and ask the model to identify specific components. ## Limitations and Requirements ## Comparison to Alternatives ## Summary In conclusion, the Gemini API's support for combined text and image inputs in a single call is a foundational feature.
Knowledge provided by Answers.org.
If any information on this page is erroneous, please contact hello@answers.org.
Answers.org content is verified by brands themselves. If you're a brand owner and want to claim your page, please click here.