olmocr
Code & Development
Visit Website
Rating ⭐ 4.0
Pricing Free ?
Views 448

olmocr

Open-source toolkit from Allen Institute for AI that converts PDFs and scanned documents into clean, structured Markdown text.

olmocr - website preview

About

olmOCR is a document processing toolkit built by the Allen Institute for AI. It uses a 7-billion-parameter vision language model to convert PDFs, PNGs, and JPEGs into structured Markdown, preserving reading order through multi-column layouts, figures, tables, equations, and handwritten content. Headers, footers, and page artifacts are stripped automatically. A free web demo is available, and the software is Apache 2.0 licensed for commercial and personal use. Self-hosted processing runs at under $200 per million pages; third-party inference providers offer per-token pricing for teams without GPU infrastructure.

Reviews (0)

Sign in to leave a review.

No reviews yet. Be the first to review!