Skip to main content

OCR Integration

The OCR integration lets agents extract text from images and PDF documents using Gemini Vision. No external API key required — it runs on Vois AI infrastructure.

Actions

ocr_read_image

Extracts text from an image file (JPEG, PNG, WebP, GIF, BMP).

Parameters:

ParameterTypeDescription
image_urlstringPublicly accessible URL of the image
image_base64stringBase64-encoded image (alternative to URL)
promptstringOptional instruction to guide extraction

Returns:

{
"text": "Extracted text content...",
"character_count": 342
}

ocr_read_document

Extracts text from a PDF document.

Parameters:

ParameterTypeDescription
image_urlstringPublicly accessible URL of the PDF
image_base64stringBase64-encoded PDF (alternative to URL)
promptstringOptional instruction to guide extraction

Returns:

{
"text": "Extracted document content...",
"character_count": 1820
}

Installing OCR

  1. Go to IntegrationsBrowse
  2. Find OCR and click Install
  3. Select ocr_read_image, ocr_read_document, or both
  4. Click Install — no auth step required

Using OCR in an agent

Attach the OCR integration to an agent, then reference it in the system prompt or a skill:

When a caller uploads a document, use the ocr_read_document tool to extract 
the text and summarise the key information for the caller.

Testing with file upload

In the Playground, use the paperclip button to attach an image or PDF. The file is uploaded and the URL is sent to the agent, which can then call the OCR tool to read it.