OCR Integration

The OCR integration lets agents extract text from images and PDF documents using Gemini Vision. No external API key required — it runs on Vois AI infrastructure.

Actions

`ocr_read_image`

Extracts text from an image file (JPEG, PNG, WebP, GIF, BMP).

Parameters:

Parameter	Type	Description
`image_url`	string	Publicly accessible URL of the image
`image_base64`	string	Base64-encoded image (alternative to URL)
`prompt`	string	Optional instruction to guide extraction

Returns:

{
  "text": "Extracted text content...",
  "character_count": 342
}

`ocr_read_document`

Extracts text from a PDF document.

Parameters:

Parameter	Type	Description
`image_url`	string	Publicly accessible URL of the PDF
`image_base64`	string	Base64-encoded PDF (alternative to URL)
`prompt`	string	Optional instruction to guide extraction

Returns:

{
  "text": "Extracted document content...",
  "character_count": 1820
}

Installing OCR

Go to Integrations → Browse
Find OCR and click Install
Select ocr_read_image, ocr_read_document, or both
Click Install — no auth step required

Using OCR in an agent

Attach the OCR integration to an agent, then reference it in the system prompt or a skill:

When a caller uploads a document, use the ocr_read_document tool to extract 
the text and summarise the key information for the caller.

Testing with file upload

In the Playground, use the paperclip button to attach an image or PDF. The file is uploaded and the URL is sent to the agent, which can then call the OCR tool to read it.

Actions​

ocr_read_image​

ocr_read_document​

Installing OCR​

Using OCR in an agent​

Testing with file upload​