Scanned PDF to Markdown
Extract text from scanned or image-only PDFs and get clean Markdown. OCR runs entirely in your browser — your file never leaves your device.
Drop a scanned PDF here, or click to browse
Image or scanned PDF · OCR in your browser · Max 50MB
First run downloads a language model (~2–15MB). After that it works offline — your file is processed locally.
Built-in OCR
Reads text from scanned and image-only PDFs that normal converters can't, powered by Tesseract in your browser.
100+ languages
Recognize English, Chinese, Japanese, Korean, Spanish, French, German, Russian and many more.
Private by design
OCR runs locally via WebAssembly. Only the language model is fetched — your document never leaves your device.
Frequently asked questions
What is a scanned PDF?
A scanned or image-only PDF has no selectable text layer — it's essentially pictures of pages. Regular converters return nothing, so OCR is needed to read the text.
Is the OCR free and private?
Yes. Recognition runs entirely in your browser using WebAssembly. Only the language model is downloaded from a CDN; your PDF is never uploaded.
Which languages are supported?
Pick from English, Simplified and Traditional Chinese, Japanese, Korean, Spanish, French, German, Russian and more before you start.
Why is OCR slower than a normal conversion?
OCR analyzes each page image pixel by pixel to recognize characters, which is heavier than reading an existing text layer. Larger documents take longer.
Related tools
Convert PDF to clean Markdown right in your browser. Built for LLM, RAG, Obsidian and Notion workflows.
PDF to Markdown for ObsidianConvert PDFs into clean, offline Markdown notes you can drop straight into your Obsidian vault — headings, lists and tables kept intact.
PDF to Markdown for ChatGPT & LLMsConvert PDFs into compact, structured Markdown that ChatGPT, Claude and RAG pipelines can parse reliably — all processed locally in your browser.