PDF to Text — OCR in Your Browser

Extract text from scanned or image-based PDFs, page by page. Each page is rendered to an image and run through OCR in your browser.

Drop a PDF here or

How to use

  1. Drop a PDF or click browse to select one.
  2. Pick a DPI. 200 is a good default; bump to 300 for dense or small-font pages.
  3. Optionally limit pages with a range like 1-3, 5, 8-10, or leave blank to OCR every page.
  4. Click Extract text. The first run downloads roughly 8 MB of OCR engine and language data from this site (cached by your browser for next time).
  5. Copy the result or download it as a .txt file.

What does it do?

Each selected page is rendered to a canvas at your chosen DPI, then run through Tesseract — the open-source OCR engine maintained by Google — compiled to WebAssembly. The recognized text for each page is concatenated into a single output with a --- Page N --- separator between pages so you can locate any passage back to its source page.

Example

Input — a scanned 2-page memo, 200 DPI, all pages. Output textarea:

--- Page 1 ---
MEMO

To: All Staff
From: Operations
Date: April 12, 2026
Subject: Friday parking changes

Starting this Friday, the east lot
will be closed for resurfacing…

--- Page 2 ---
…overflow parking is available in
Lot C for the duration of the work.
Questions should be directed to
facilities@example.com.

Common errors and pitfalls

Most OCR disappointments come from the source document, not the engine. A bad scan cannot be recovered with higher DPI.

  • Garbled output on a low-quality scan. Source images below roughly 150 DPI produce mangled text no matter what you set here. Re-scan at 300 DPI, or use the original digital file if you have it.
  • Columns are interleaved in the output. OCR reads in natural scan order and can confuse multi-column layouts. Split the PDF by column first with the Image Cropper on page renders, then OCR each column separately.
  • Pages are rotated 90° or upside down. Tesseract does not auto-rotate. Fix the PDF orientation with the PDF Organizer first, then retry.
  • Tab freezes on a large PDF. 100+ pages at 300 DPI can exhaust memory. Render a page range first (e.g. 1-25) to confirm quality, then batch the rest. Drop to 200 DPI if your device is memory-constrained.
  • Invalid range. Invalid range: "1 through 5" — only hyphens and commas are supported. Use 1-5 format.
  • Encrypted PDF. Password-protected PDFs fail to load. Unlock with your PDF viewer via File > Save As, then retry with the unprotected copy.

Is my data private?

Yes. We don't save the PDF you drop, the rendered page images, or the text the OCR produces. Nothing is stored, logged, or retained — everything is discarded the moment you close or refresh the tab. There's no record on our side of what you extracted. You can verify in your browser's developer tools.

Frequently asked questions

Does this work on scanned PDFs and image-only PDFs?

Yes — that is the point. Each page is rendered to an image and run through OCR, so it works the same whether the PDF was born digital or scanned. For born-digital PDFs that already contain selectable text, a direct text-extraction tool is faster and more accurate; OCR is the right choice when the text is baked into page images.

Which DPI should I pick?

200 DPI is a good default for OCR accuracy on typical scans and screenshots. 150 DPI is faster but loses small text. 300 DPI helps with dense pages, small fonts, or low-quality scans but doubles rendering time and memory use. Going above 300 rarely helps if the source scan is itself lower resolution.

Why is the first run slow?

The first run downloads about 8 MB of OCR engine and English language data from this site, then caches them in your browser. Subsequent runs start in under a second. After that, speed is dominated by rendering and recognizing each page — typically 3–10 seconds per page depending on DPI and page complexity.

What about handwriting and non-English text?

Handwriting recognition is weak — Tesseract is trained on printed text and struggles with cursive or messy writing. This tool ships English only; multi-language support is on the roadmap but not yet available. For now, use a clean printed English scan for best results.

Can it handle password-protected PDFs?

No. Encrypted PDFs fail to load with an error. Open the file in your PDF viewer, enter the password, and re-save via File > Save As to produce an unprotected copy. Then drop that copy here.

Do you save my PDFs or the extracted text?

No. We don't save the PDF you drop, the rendered page images, or the extracted text. Everything is discarded when you close or refresh the tab — no logs, no record on our side of what you OCR'd. You can verify with your browser's developer tools.

Related tools