OCR Integration
The OCR integration lets agents extract text from images and PDF documents using Gemini Vision. No external API key required — it runs on Vois AI infrastructure.
Actions
ocr_read_image
Extracts text from an image file (JPEG, PNG, WebP, GIF, BMP).
Parameters:
| Parameter | Type | Description |
|---|---|---|
image_url | string | Publicly accessible URL of the image |
image_base64 | string | Base64-encoded image (alternative to URL) |
prompt | string | Optional instruction to guide extraction |
Returns:
{
"text": "Extracted text content...",
"character_count": 342
}
ocr_read_document
Extracts text from a PDF document.
Parameters:
| Parameter | Type | Description |
|---|---|---|
image_url | string | Publicly accessible URL of the PDF |
image_base64 | string | Base64-encoded PDF (alternative to URL) |
prompt | string | Optional instruction to guide extraction |
Returns:
{
"text": "Extracted document content...",
"character_count": 1820
}
Installing OCR
- Go to Integrations → Browse
- Find OCR and click Install
- Select
ocr_read_image,ocr_read_document, or both - Click Install — no auth step required
Using OCR in an agent
Attach the OCR integration to an agent, then reference it in the system prompt or a skill:
When a caller uploads a document, use the ocr_read_document tool to extract
the text and summarise the key information for the caller.
Testing with file upload
In the Playground, use the paperclip button to attach an image or PDF. The file is uploaded and the URL is sent to the agent, which can then call the OCR tool to read it.