Monday, May 24, 2021

Enhance Cloud Search results for PDFs containing images with Optical Character Recognition support

What’s changing 

Cloud Search now supports Optical Character Recognition (OCR) based text extraction for PDFs that contain images, such as: 
  • Physical contract documents 
  • Engineering documents that contain annotations or labels 
  • Physical customer invoices, and more 

This makes PDFs with images containing text, such as scanned documents, easily searchable by users and improving discoverability of such PDFs. 


Who’s impacted 

Admins and end users 


Why it’s important 

Many critical business documents are either in physical form or as scanned versions of those physical documents. With OCR support, admins can now easily index these documents for Cloud Search, making it easier for users to quickly find relevant scanned documents. 

In addition, this feature eliminates the need to extract the text offline from PDFs containing images before indexing these documents on Cloud Search. 


Getting started 

  • Admins: The feature is ON by default. Use this guide to learn more about how to use enhanced search for PDFs containing images Important Note: PDFs must be submitted using the Asynchronous Indexing mode and must contain only images. 
  • End Users: No user action is required 

Rollout pace 


Availability 

  • Available to Google Workspace Enterprise Plus and Google Cloud Search customers 
  • Not available to Google Workspace Essentials, Business Starter, Business Standard, Business Plus, Enterprise Essentials, Enterprise Standard, Education Fundamentals, Education Plus, Frontline, and Nonprofits, as well as G Suite Basic and Business customers

Resources