olmOCR 2 is an open-source software designed for high-throughput conversion of PDFs and different paperwork into plain textual content whereas preserving pure studying order. It helps tables, equations, handwriting, and extra.
olmOCR 2 has been skilled on a extremely curated set of educational papers, technical documentation, and different reference content material; the most recent model makes use of artificial information and unit assessments as verifiable rewards for reinforcement studying to additional lower hallucinations. For full particulars on the recipe, learn our technical report. The present mannequin was fine-tuned on English paperwork utilizing a multilingual base VLM; different languages may go.
The olmOCR pipeline v0.4.0 was evaluated on the olmOCR-bench utilizing two mannequin variants: olmOCR-2-7B-1025 and olmOCR-2-7B-1025-FP8. Their scores throughout varied doc classes are as follows:

The 2 fashions present very shut efficiency, with minor variations in sure classes. Each fashions excel significantly in Headers and Footers, and Base doc classes with scores close to or above 95. The Outdated Scans class exhibits the bottom scores within the excessive 40s. The FP8 mannequin variant achieves a slight edge in general and a few particular class scores like Tables and Headers/Footers, however the variations are marginal.
-
FP8 weights with BF16 compute ship sensible effectivity, with a number of GPU configurations supported. For instance, client GPUs with 8 to 24 GB VRAM can run these fashions with appropriate batch sizes.
-
Giant-scale manufacturing setups use A100 or H100 GPUs with 40–80 GB VRAM, enabling greater batch throughput.
-
Really useful utilization consists of enabling paged KV cache, setting most sequences to batch dimension, and using fashionable consideration mechanisms like flash consideration.
-
Pre/post-processing makes use of picture sizing and rotation retries, automation dealt with by the olmOCR toolkit for finest accuracy.
-
PDFs should be rasterized to photographs per web page. The toolkit mechanically applies document-anchoring (extracting textual content blocks, positions, and pictures from PDF internals) to scale back hallucinations, which is essential for born-digital PDFs. For scans with out metadata, it falls again to pure visible processing.
-
Prompting and Output: Use the simplified immediate from the paper (Appendix E.2) with {base_text} changed by anchored content material. Outputs are structured JSON (e.g., natural_text discipline for linearized Markdown/LaTeX). Implement retries (as much as 3 occasions) for repetitions or schema failures; enhance the temperature to 0.8 if crucial.
-
Limitations: Finest for English-dominant PDFs; could wrestle with excessive rotations (auto-detected by way of JSON fields) or non-text-heavy pages (e.g., diagrams, output null if no readable textual content. Decontamination utilized in opposition to PII, however guide evaluation is suggested for delicate information.
-
Scaling and Price: Optimized for giant batches (500+ pages per work merchandise); coordinate multi-GPU by way of cloud buckets (e.g., S3). Inference throughput: ~906 tokens/sec on L40S (with 12% retry fee). Keep away from schema-constrained decoding to forestall out-of-domain generations.

-
Analysis and Advantageous-Tuning: Take a look at your setup on the olmOCR-Bench. Coaching code launched, use Hugging Face Transformers with batch dimension 4, LR 1e-6, for ~16 H100 hours per run. olmOCR-Bench Scores From paper Desk 4; evaluated with olmOCR toolkit (v0.1.75, anchored mode) for dependable comparisons.

Really useful Flags / Suggestions
-
Inference Engines: Desire SGLang for throughput (–port 30000, –model-path allenai/olmOCR-7B-0225-preview); fallback to vLLM (–trust-remote-code, –dtype bfloat16). Set max_tokens=8192; cap anchor textual content at 6000 chars.
-
Toolkit Utilization: Set up by way of pip set up olmocr; run olmocr convert enter.pdf –output output.md for single recordsdata. For batches: olmocr batch
Find out how to Run olmOCR-2-7B-1025-FP8 on Spheron Community
Spheron Network offers you entry to highly effective GPUs with out utilizing conventional cloud suppliers like AWS or Google. You hire GPUs straight from suppliers worldwide. Allow us to stroll you thru the setup.
Step-by-Step Setup Information
Step 1: Entry Spheron Console and Add Credit
Head over to console.spheron.network and log in to your account. If you do not have an account but, create one by signing up along with your Electronic mail/Google/Discord/GitHub.

As soon as logged in, navigate to the Deposit part. You will see two cost choices:

SPON Token: That is the native token of Spheron Community. Whenever you deposit with SPON, you unlock the total energy of the ecosystem. SPON credit can be utilized on each:
-
Group GPUs: Decrease-cost GPU sources powered by group Fizz Nodes (private machines and residential setups)
-
Safe GPUs: Information center-grade GPU suppliers providing enterprise reliability
USD Credit: With USD deposits, you’ll be able to deploy solely on Safe GPUs. Group GPUs will not be accessible with USD deposits.
For operating olmOCR, we advocate beginning with Safe GPUs to make sure constant efficiency. Add enough credit to your account based mostly in your anticipated utilization.
Step 2: Navigate to GPU Market
After including credit, click on on Market. Right here you may see two foremost classes:
Safe GPUs: These run on information center-grade suppliers with enterprise SLAs, excessive uptime ensures, and constant efficiency. Superb for manufacturing workloads and purposes that require reliability.
Group GPUs: These run on group Fizz Nodes, basically private machines contributed by group members. They’re considerably cheaper than Safe GPUs however could have variable availability and efficiency.

For this tutorial, we’ll use Safe GPUs to make sure clean set up and optimum efficiency.
Step 3: Search and Choose Your GPU
You possibly can seek for GPUs by:
-
Area: Discover GPUs geographically near your customers
-
Deal with: Search by particular supplier addresses
-
Title: Filter by GPU mannequin (RTX 4090, A100, and so forth.)
For this demo, we’ll choose a Safe RTX 4090 (or A6000 GPU), which has glorious efficiency for operating olmOCR. The 4090 supplies the proper steadiness of price and functionality for each testing and reasonable manufacturing workloads.
Click on Hire Now in your chosen GPU to proceed to configuration.
Step 4: Choose Customized Picture Template
After clicking Hire Now, you may see the Hire Affirmation dialog. This display exhibits all of the configuration choices on your GPU deployment. Let’s configure every part. Not like pre-built utility templates, operating olmOCR requires a custom-made surroundings to leverage its improvement capabilities. Choose the configuration as proven within the picture beneath and click on “Verify” to deploy.

-
GPU Sort: The display shows your chosen GPU (RTX 4090 within the picture), together with its specs: Storage, CPU Cores, and RAM.
-
GPU Depend: Use the + and – buttons to regulate the variety of GPUs. For this tutorial, use 1 GPU for price effectivity.
-
Choose Template: Click on the dropdown that exhibits “Ubuntu 24” and search for template choices. To run olmOCR, we require an Ubuntu-based template with SSH enabled. You will discover the template exhibits an SSH-enabled badge, which is crucial for accessing your occasion by way of terminal. Choose: Ubuntu 24 or Ubuntu 22 (each work completely)
-
Length: Specify the period for which you need to hire the GPU. The dropdown exhibits choices like 1 hour (good for fast testing), 8 hours, 24 hours, or longer for manufacturing use. For this tutorial, choose 1 hour initially. You possibly can at all times prolong the period later if wanted.
-
Choose SSH Key: Click on the dropdown to decide on your SSH key for safe authentication. If you have not added an SSH key but, you may see a message to create one.
-
Expose Ports: This part means that you can expose particular ports out of your deployment. For primary command-line entry, you’ll be able to depart this empty. In the event you plan to run internet companies or Jupyter notebooks, you’ll be able to add ports.
-
Supplier Particulars: The display exhibits supplier info:
This exhibits which decentralized supplier will host your GPU occasion.
-
Scroll right down to the Select Cost part. Choose your most well-liked cost choice:
-
USD – Pay with conventional forex (bank card or different USD cost strategies)
-
SPON: Pay with Spheron’s native token for potential reductions and entry to each Group and Safe GPUs
-
The dropdown exhibits “USD” within the instance, however you’ll be able to swap to SPON in case you have tokens deposited.
Step 5: Verify the “Deployment in Progress“
Subsequent, you’ll see a dwell standing window exhibiting each step of what is occurring, like: Validating configuration, Checking steadiness, Creating order, Ready for bids, Accepting a bid, Sending manifest, and at last, Lease Created Efficiently. As soon as that is full, your Ubuntu server is dwell!
Deployment usually completes in underneath 60 seconds. When you see “Lease Created Efficiently,” your Ubuntu server with GPU entry is dwell and able to use!

Step 6: Entry Your Deployment
As soon as deployment completes, navigate to the Overview tab in your Spheron console. You will see your deployment listed with:
-
Standing: Working
-
Supplier particulars: GPU location and specs
-
Connection info: SSH entry particulars
-
Port mappings: Any uncovered companies

Step 7: Join by way of SSH
Click on the SSH tab, and you will note the steps on the right way to join your terminal by way of SSH to your deployment particulars. It can look one thing just like the picture beneath, comply with it:

ssh -i <path-to-private-key> -p <port> root@<deployment-url>
Open your terminal and paste this command. Upon your first connection, you may see a safety immediate requesting that you simply confirm the server’s fingerprint. Sort “sure” to proceed. You are now linked to your GPU-powered digital machine on the Spheron decentralized community.

Step 8: Improve to Python 3.11 and Set up Pip
Your VM at present comes with Python 3.10.12 preinstalled. To improve to Python 3.11, we’ll use the Deadsnakes PPA, a preferred repository that gives newer Python variations for Ubuntu.
1. Add the Deadsnakes PPA
Run the next instructions to replace your system packages and add the Deadsnakes repository:
apt replace && sudo apt set up -y software-properties-common curl ca-certificates
add-apt-repository -y ppa:deadsnakes/ppa
apt replace

-
software-properties-commonhelps handle further repositories. -
curlandca-certificatesguarantee safe information switch. -
Including
ppa:deadsnakes/ppaoffers entry to newer Python builds.
2. Set up Python 3.11, Pip, and Wheel
Now, set up Python 3.11 and important packaging instruments:
apt replace && apt set up -y software-properties-common && add-apt-repository -y ppa:deadsnakes/ppa && apt replace && apt set up -y python3.11 python3.11-venv python3.11-dev

-
python3.11-venvmeans that you can create remoted digital environments. -
python3.11-devconsists of headers for constructing Python extensions. -
ensurepipinstalls or upgrades Pip. -
Upgrading
pip,setuptools, andwheelensures compatibility with fashionable packages.
3. Guarantee pip for Python3.11 and improve packaging instruments
python3.11 -m ensurepip --upgrade && python3.11 -m pip set up --upgrade pip setuptools wheel

Confirm your set up:
python3.11 --version
python3.11 -m pip --version
Step 9: Create and Activate a Python 3.11 Digital Atmosphere
To forestall bundle conflicts, it’s finest to make use of a devoted digital surroundings.
python3.11 -m venv ~/.venvs/py311
supply ~/.venvs/py311/bin/activate

-
This creates a brand new surroundings in
~/.venvs/py311. -
The
supplyThe command prompts it, isolating dependencies.
Step 10: Set up System Dependencies for PDF Rendering
Subsequent, set up system packages required by OCR instruments and PDF rendering libraries.
apt-get replace
apt-get set up -y poppler-utils ttf-mscorefonts-installer msttcorefonts
fonts-crosextra-caladea fonts-crosextra-carlito gsfonts lcdf-typetools

Rationalization:
-
poppler-utilssupplies PDF textual content extraction and conversion instruments. -
ttf-mscorefonts-installerand different font packages guarantee correct textual content rendering. -
gsfontsandlcdf-typetoolsenhance compatibility with a variety of PDFs.
Step 11: Set up PyTorch 2.7.1 (CUDA 12.6)
Now, set up PyTorch 2.7.1 with CUDA 12.6 assist for GPU acceleration.
pip set up "torch==2.7.1+cu126" --index-url https://obtain.pytorch.org/whl/cu126
This pulls the proper PyTorch wheel optimized for CUDA 12.6, making certain full GPU assist.

Step 12: Set up vLLM 0.11.0
Set up vLLM, the high-performance inference engine utilized by olmOCR.
pip set up --no-cache-dir "vllm==0.11.0"
The --no-cache-dir The flag ensures a recent set up with out utilizing outdated cached packages.

Step 13: Set up olmOCR
Now, set up olmOCR, the FP8-quantized OCR-VLM mannequin by AllenAI.
pip set up -U --no-deps "olmocr[gpu]"

-
The
-Uflag upgrades to the most recent suitable model. -
--no-depsavoids reinstalling dependencies you have already got. -
The
[gpu]tag installs GPU-optimized parts.
Set up olmOCR full bundle (with all extras)
pip set up olmocr[all]

What it does: installs GCC, make, and different instruments wanted to construct any packages that require native compilation.
apt-get replace && apt-get set up -y build-essential

Step 15: Run the olmOCR Pipeline
Downloads a take a look at PDF (arXiv paper) to run by the OCR pipeline. You possibly can similar instructions to obtain different PDFs and simply change the https hyperlink to obtain
wget -O /root/arxiv-paper.pdf "https://www.arxiv.org/pdf/2510.03847"

Lastly, take a look at your setup by operating the olmOCR pipeline on a pattern PDF.
python -m olmocr.pipeline ./workspace --markdown --pdfs /root/arxiv-paper.pdf

-
This command launches the olmOCR pipeline, which mechanically begins an inner vLLM inference server.
-
The
--markdownflag outputs leads to Markdown format. -
--pdfsspecifies the PDF file(s) to course of.

You’ve now efficiently put in and configured olmOCR-2-7B-1025-FP8, AllenAI’s FP8-quantized OCR-VLM, in your GPU-enabled VM. This setup combines the facility of Python 3.11, PyTorch 2.7.1, and vLLM 0.11.0 to ship production-grade OCR efficiency with outstanding effectivity.
You’ll find the output within the./workspace
The FP8 quantization ensures a light-weight reminiscence footprint whereas sustaining top-tier accuracy, reaching a formidable ≈82.4 ± 1.1 olmOCR-Bench rating. With vLLM’s parallel inference capabilities, you’ll be able to course of paperwork at scale, changing scanned PDFs, tables, and handwritten notes into structured, Markdown-formatted textual content inside seconds. This configuration is right for batch processing, information extraction pipelines, and even as the inspiration for a scalable, AI-powered document-processing service. In essence, you now have a high-performance, GPU-optimized OCR stack prepared for each analysis and manufacturing workloads.
You might also like
More from Web3
Bitcoin ETFs’ $1.2B Streak Hangs in Balance as FOMC Takes Center Stage
Briefly Spot Bitcoin ETFs have raked in $1.16 billion over the previous seven days, with weekly inflows now at $2.52 …
Turkey Soft Skills Training Market Outlook: Trends, Growth, and Future Opportunities 2026-2034
The Turkey Gentle Abilities Coaching Market was valued at USD 326.24 Million in 2025 and is projected to …





