olmocr

Open-source toolkit from Allen Institute for AI that converts PDFs and scanned documents into clean, structured Markdown text.

Visit Website

About

olmOCR is a document processing toolkit built by the Allen Institute for AI. It uses a 7-billion-parameter vision language model to convert PDFs, PNGs, and JPEGs into structured Markdown, preserving reading order through multi-column layouts, figures, tables, equations, and handwritten content. Headers, footers, and page artifacts are stripped automatically. A free web demo is available, and the software is Apache 2.0 licensed for commercial and personal use. Self-hosted processing runs at under $200 per million pages; third-party inference providers offer per-token pricing for teams without GPU infrastructure.

Reviews (0)

No reviews yet. Be the first to review!

Details

Category
Code & Development
Pricing
Free
Rating
4.0 / 5.0
Views
448

olmocr

About

Reviews (0)

Details

Links

Compare

Luxoret AI