British Flag

OCR Translator User Manual

Copyright © 2025 Tomasz Kamiński
Last Updated: 9 June 2025

OCR Translator Main Interface

Table of Contents

  1. Introduction
  2. Getting Started
  3. Main Interface
  4. Setting Up Translation Areas
  5. Settings Configuration
  6. Translation Methods
  7. Keyboard Shortcuts
  8. Troubleshooting
  9. Tips and Best Practices

Introduction

OCR Translator is a desktop application that automatically captures text from any area of your screen, performs optical character recognition (OCR), and translates the text in real-time. With its floating overlay windows, you can position the translation anywhere on your screen, making it perfect for translating games, videos, PDFs, or any application with text that you can't easily copy and paste.

Getting Started

Requirements

Before using OCR Translator , ensure you have:

First Run

  1. Launch OCR Translator by running the main.py script or the executable if you're using a compiled version.
  2. On first startup, the application loads with default settings and both source and target areas are hidden.
  3. Before starting translation, you need to:
  4. When you click Start, the translation window will automatically be shown.
  5. When you click Stop, the translation window will automatically be hidden.
  6. You can manually toggle the visibility of the translation window using Alt+2 at any time.

Main Interface

The main interface is organised into four tabs:

Home Tab

Home Tab

  1. Select Source Area (OCR) – Define the area where text will be captured.
  2. Select Target Area (Translation) – Define where the translated text will appear.
  3. Start/Stop – Toggle the translation process on/off.
  4. Hide/Show Source Window – Toggle visibility of the source capture area.
  5. Hide/Show Target Window – Toggle visibility of the translation target area.
  6. Clear Translation Cache – Clear translations stored in memory to force retranslation.
  7. Clear Debug Log – Clear the application log.
  8. Enable/Disable Debug Log – Disable debug logging for improved performance.
  9. Keyboard Shortcuts – List of available hotkeys.
  10. Status – Current application status.

Settings Tab

Settings Tab

Here you can configure:

  1. Translation Model – Select between different translation providers.
  2. Source Language – The language to detect with OCR and translate from (DeepL and Google Translate only).
  3. Target Language – The language to translate into (DeepL and Google Translate only).
  4. API Key – For DeepL or Google Translate.
  5. Quality – Choose between Classic (faster) or Next-gen (potentially better quality) models (DeepL only).
  6. MarianMT Options – For offline neural translation (MarianMT-specific).
  7. Tesseract Path – Path to the Tesseract OCR executable.
  8. Scan Interval (ms) – How frequently to capture the screen.
  9. Clear Translation Timeout (s) – Time before clearing translations when source text disappears.
  10. Text Stability Threshold – How many consistent readings needed before translation.
  11. OCR Confidence Threshold – Minimum confidence for OCR text detection.
  12. Image Preprocessing Mode – How to process images for OCR.
  13. OCR Debugging – Option to show debug images and text in the Debugging tab.
  14. Preview button – Opens OCR Preview window.
  15. Remove Trailing Garbage – Option to remove text after the last punctuation mark.
  16. Appearance Options – Colours and font sizes for overlays.
  17. File Caching Options – Settings to enable/disable caching for DeepL and Google Translate.

OCR Preview

OCR Preview

Clicking the Preview button on the Settings tab opens a separate OCR Preview window. This window displays:

  1. Processed Image (1:1 scale) – The preprocessed image being used for OCR recognition.
  2. Recognized Text – The text currently being recognised by the OCR engine.

This preview window is particularly useful for fine-tuning OCR settings and understanding why certain text might not be recognised properly. It can be moved and resized independently of the main application window.

Debugging Tab

Debugging Tab

This tab shows:

  1. Original Image – The raw captured image.
  2. Processed Image – The image after preprocessing for OCR.
  3. OCR Results – Text detected by OCR.
  4. Application Log – Running log of application events.
  5. Save OCR Images and Refresh Log – Buttons to save debug images and refresh the log.

About Tab

This tab provides basic information about the application.

Setting Up Translation Areas

Selecting the Source Area

  1. Click Select Source Area (OCR) button.
  2. Your screen will dim, and you'll see see a black cross.
  3. Click and drag to select the area containing text you want to translate.
  4. After selection, a semi-transparent overlay window appears at the selected location.
  5. This overlay are hidden by default when the application starts.
  6. This overlay can be:

Selecting the Target Area

  1. Click Select Target Area (Translation) button.
  2. Your screen will dim, and you'll see a black cross.
  3. Click and drag to select where you want translations to appear.
  4. After selection, a semi-transparent overlay window appears at the selected location.
  5. This overlay are hidden by default when the application starts.
  6. This overlay can be:

Settings Configuration

Translation Configuration

  1. Translation Model:

  2. Source Language:

  3. Target Language:

OCR Configuration

  1. Tesseract Path:

  2. Image Preprocessing Mode:

  3. Adaptive Mode:

    When you select Adaptive preprocessing mode, the system unlocks sophisticated adaptive thresholding capabilities that excel in challenging visual environments. This mode is particularly valuable when dealing with difficult conditions such as small subtitle text overlaid on dynamic, flickering backgrounds with constantly changing colours and lighting.

    Unlike the three standard preprocessing modes, Adaptive mode provides two adjustable parameters that allow you to fine-tune the OCR recognition process:

    This mode proves invaluable when standard preprocessing fails to produce reliable results. By experimenting with these two parameters, you can often achieve superior OCR recognition compared to the ready-to-use modes, particularly in scenarios where backgrounds contain moving elements, varying illumination, or complex visual patterns that would otherwise interfere with text detection.

    For optimal results, start with moderate values (Block Size: 11, C Value: 2) and adjust based on your specific content. Increase Block Size for larger text or gradual lighting changes, and adjust C Value to balance between capturing all text and reducing false positives.

  4. OCR Confidence Threshold:

  5. Text Stability Threshold:

  6. OCR Debugging:

  7. Remove Trailing Garbage:

Performance Settings

  1. Scan Interval (ms):

  2. Clear Translation Timeout (s):

  3. Clear Translation Cache:

  4. File Caching Options (API Translation Services Only):

  5. Debug Logging:

Appearance Settings

  1. Source Area Colour – Background colour of the source capture overlay (customisable).
  2. Target Area Colour – Background colour of the translation overlay (customisable).
  3. Target Text Colour – Colour of the translated text (customisable).
  4. Target Window Font Size – Size of the translated text.

Translation Methods

MarianMT (offline and free)

  1. No API key required – completely free to use.
  2. Works entirely offline once models are downloaded.
  3. Models are downloaded automatically when first used (~500MB per language pair).
  4. Configure by:

MarianMT models are open-source neural machine translation systems that offer quite good translation quality. Whilst not quite reaching the standards of premium services like DeepL, they provide remarkably solid translations without any cost or internet requirement after the initial model download.

These models were originally designed for translating short, single sentences and would typically truncate longer passages. However, OCR Translator implements a clever workaround to this limitation. The application automatically splits longer texts into individual sentences and processes them in parallel using separate threads. These translated segments are then seamlessly stitched back together, ensuring you receive complete translations regardless of text length.

This approach offers several practical advantages:

The Translation Beam Size (MarianMT) setting allows you to balance between speed and quality. Higher values (8–12) produce more refined translations but require more processing time, whilst lower values (1–4) prioritise speed over perfect phrasing.

DeepL API

  1. Requires a DeepL account and API key.
  2. Offers premium-quality translations but supports fewer languages.
  3. Regarded by many as the industry leader in translation quality.
  4. The DeepL API Free plan allows for the translation of 500,000 characters per month free of charge (as at May, 2025).
  5. Configure by:

Quality Options

DeepL offers two quality modes to suit different needs. The Classic model provides fast, high-quality translations that work with all supported language pairs. The Next-gen model uses DeepL's latest translation technology, which can deliver even better results for certain types of content, though it processes slightly slower and may not support all language pairs.

If you select Next-gen and your chosen language pair isn't supported, the application will automatically fall back to Classic mode to ensure your translation continues working seamlessly. Both options deliver top-quality results that DeepL is renowned for.

DeepL File Caching System

OCR Translator implements a caching system for DeepL translations. Once a text segment has been translated, it's stored in the application's local cache (deepl_cache.txt). When the same text appears again, the application retrieves the translation from the cache instead of sending another API request.

It's important to understand that this caching mechanism relies entirely on OCR quality. For a cache match to occur, the OCR'd text must be identical – down to the last character – with what is stored in the cache. Even a single character difference will result in a new API call and translation. This means the actual efficiency of the cache depends heavily on consistent OCR results.

The cache can be helpful for gamers in specific scenarios. For instance, if you're playing a game where static menu options or repeated dialogue appear in exactly the same font, size, and screen position, the OCR is more likely to produce identical results each time. However, if text appears with different backgrounds, lighting, or slight position shifts, OCR variations will likely trigger new translations.

For example, a game's Save Game button might consistently be recognised identically and benefit from caching, while dynamic dialogue with changing characters or backgrounds might produce slightly different OCR results each time, limiting cache effectiveness.

The cache persists between application sessions, but its practical benefit should be viewed as a helpful bonus rather than a major API-saving feature. The more consistent and clear the text presentation, the more likely you are to benefit from the caching system.

Google Translate API

  1. Requires a Google Cloud account and API key.
  2. Supports the widest range of languages.
  3. Good for general purpose translation with broad language coverage.
  4. Configure by:

Google Translate uses the same file caching system as DeepL. Please refer to the DeepL API section above for a detailed explanation of how the caching mechanism works, its dependencies on OCR quality, and its practical benefits and limitations. All the same considerations and notes apply to Google Translate's caching functionality.

Keyboard Shortcuts

These keyboard shortcuts are available:

Shortcut Function
~ (tilde) Start/Stop Translation
Alt+1 Toggle Source Window Visibility
Alt+2 Toggle Translation Window Visibility
Alt+S Save Settings
Alt+C Clear Translation Cache
Alt+L Clear Debug Log

Note: When the application is stopped (translation inactive), the translation window will be hidden automatically. When the application is started, the translation window will appear automatically. You can manually override this behaviour using the Alt+2 shortcut at any time.

Troubleshooting

If you encounter issues:

  1. Check the Debugging tab for error messages and the application log.
  2. Enable OCR Debugging in Settings to see what's being captured and recognised in the Debugging tab. Also view the OCR Preview window (accessible in the Settings tab via the Preview button).
  3. Adjust settings as needed:
  4. Consult the Troubleshooting Guide for common issues and solutions.

Tips and Best Practices

OCR Accuracy

For best OCR results:

  1. Capture clean, high-contrast text.
  2. Select appropriate source language.
  3. Adjust preprocessing mode to match text appearance – try Adaptive mode for difficult backgrounds.
  4. Resize the capture area to frame text closely but completely.
  5. Use a larger source area for more context if OCR is struggling.
  6. Enable Remove Trailing Garbage to clean up recognition artefacts.
  7. Adjust the confidence threshold to balance between capturing all text (lower values) and reducing errors (higher values).
  8. For small subtitles on changing backgrounds, experiment with Adaptive mode's Block Size and C Value parameters.

Performance Optimisation

  1. Use a smaller source capture area.
  2. Click Disable Debug Log in the Home tab.
  3. Increase Scan Interval in the Settings tab to reduce CPU usage.
  4. Disable OCR Debugging in the Settings tab.
  5. Set the Image Preprocessing Mode to None in the Settings tab.
  6. For MarianMT:
  7. For DeepL or Google Translate:

Practical Applications

  1. Games:

    🎮
    NOTE: OCR Translator may not work correctly with some games in fullscreen mode.
    We recommend using borderless windowed mode, which is supported by most modern games.
  2. Videos:

  3. Documents & PDFs:

  4. Applications: