OCR Translator User Manual
Copyright © 2025 Tomasz Kamiński
Last Updated: 9 June 2025
Table of Contents
-
Introduction
-
Getting Started
-
Main Interface
-
Setting Up Translation Areas
-
Settings Configuration
-
Translation Methods
-
Keyboard Shortcuts
-
Troubleshooting
-
Tips and Best Practices
Introduction
OCR Translator is a desktop application that automatically captures text from any area of your screen, performs optical character recognition (OCR), and translates the text in real-time. With its floating overlay windows, you can position the translation anywhere on your screen, making it perfect for translating games, videos, PDFs, or any application with text that you can't easily copy and paste.
Getting Started
Requirements
Before using OCR Translator , ensure you have:
First Run
- Launch OCR Translator by running the
main.py
script or the executable if you're using a compiled version.
- On first startup, the application loads with default settings and both source and target areas are hidden.
- Before starting translation, you need to:
- Verify the Tesseract path in the Settings tab.
- Select source and target areas.
- Configure your preferred translation method.
- When you click Start, the translation window will automatically be shown.
- When you click Stop, the translation window will automatically be hidden.
- You can manually toggle the visibility of the translation window using Alt+2 at any time.
Main Interface
The main interface is organised into four tabs:
Home Tab
-
Select Source Area (OCR) – Define the area where text will be captured.
-
Select Target Area (Translation) – Define where the translated text will appear.
-
Start/Stop – Toggle the translation process on/off.
-
Hide/Show Source Window – Toggle visibility of the source capture area.
-
Hide/Show Target Window – Toggle visibility of the translation target area.
-
Clear Translation Cache – Clear translations stored in memory to force retranslation.
-
Clear Debug Log – Clear the application log.
-
Enable/Disable Debug Log – Disable debug logging for improved performance.
-
Keyboard Shortcuts – List of available hotkeys.
-
Status – Current application status.
Settings Tab
Here you can configure:
-
Translation Model – Select between different translation providers.
-
Source Language – The language to detect with OCR and translate from (DeepL and Google Translate only).
-
Target Language – The language to translate into (DeepL and Google Translate only).
-
API Key – For DeepL or Google Translate.
-
Quality – Choose between Classic (faster) or Next-gen (potentially better quality) models (DeepL only).
-
MarianMT Options – For offline neural translation (MarianMT-specific).
-
Tesseract Path – Path to the Tesseract OCR executable.
-
Scan Interval (ms) – How frequently to capture the screen.
-
Clear Translation Timeout (s) – Time before clearing translations when source text disappears.
-
Text Stability Threshold – How many consistent readings needed before translation.
-
OCR Confidence Threshold – Minimum confidence for OCR text detection.
-
Image Preprocessing Mode – How to process images for OCR.
-
OCR Debugging – Option to show debug images and text in the Debugging tab.
-
Preview button – Opens OCR Preview window.
-
Remove Trailing Garbage – Option to remove text after the last punctuation mark.
-
Appearance Options – Colours and font sizes for overlays.
-
File Caching Options – Settings to enable/disable caching for DeepL and Google Translate.
OCR Preview
Clicking the Preview button on the Settings tab opens a separate OCR Preview window. This window displays:
-
Processed Image (1:1 scale) – The preprocessed image being used for OCR recognition.
-
Recognized Text – The text currently being recognised by the OCR engine.
This preview window is particularly useful for fine-tuning OCR settings and understanding why certain text might not be recognised properly. It can be moved and resized independently of the main application window.
Debugging Tab
This tab shows:
-
Original Image – The raw captured image.
-
Processed Image – The image after preprocessing for OCR.
-
OCR Results – Text detected by OCR.
-
Application Log – Running log of application events.
-
Save OCR Images and Refresh Log – Buttons to save debug images and refresh the log.
About Tab
This tab provides basic information about the application.
Setting Up Translation Areas
Selecting the Source Area
- Click Select Source Area (OCR) button.
- Your screen will dim, and you'll see see a black cross.
- Click and drag to select the area containing text you want to translate.
- After selection, a semi-transparent overlay window appears at the selected location.
- This overlay are hidden by default when the application starts.
- This overlay can be:
- Moved: Drag from the title bar.
- Resized: Drag from any edge or corner.
- Hidden/Shown: Use the Hide/Show Source Window button or Alt+1 hotkey.
Selecting the Target Area
- Click Select Target Area (Translation) button.
- Your screen will dim, and you'll see a black cross.
- Click and drag to select where you want translations to appear.
- After selection, a semi-transparent overlay window appears at the selected location.
- This overlay are hidden by default when the application starts.
- This overlay can be:
- Moved: Drag from the title bar.
- Resized: Drag from any edge or corner.
- Hidden/Shown: Use the Hide/Show Target Window button or Alt+2 hotkey.
Settings Configuration
Translation Configuration
-
Translation Model:
-
MarianMT (offline and free) – No API key needed, downloads models on first use, works offline.
-
DeepL API – Requires API key, high-quality translations.
-
Google Translate API – Requires API key, supports many languages.
-
Source Language:
- Language to translate from.
- The OCR engine will use this language for text recognition to improve accuracy.
- When you change the source language for translation, the OCR language is updated accordingly.
-
Target Language:
- Language to translate into.
- Note that not all language pairs are supported by all translation methods.
OCR Configuration
-
Tesseract Path:
- Should point to your Tesseract installation (e.g.,
C:\Program Files\Tesseract-OCR\tesseract.exe
).
- Use the Browse button to locate it if needed.
-
Image Preprocessing Mode:
-
None – No preprocessing, good for clear text.
-
Binary – Black and white conversion, good for high contrast text.
-
Binary Inverted – Inverted binary, good for white text on dark backgrounds.
-
Adaptive – Advanced adaptive thresholding for challenging environments.
-
Adaptive Mode:
When you select Adaptive preprocessing mode, the system unlocks sophisticated adaptive thresholding capabilities that excel in challenging visual environments. This mode is particularly valuable when dealing with difficult conditions such as small subtitle text overlaid on dynamic, flickering backgrounds with constantly changing colours and lighting.
Unlike the three standard preprocessing modes, Adaptive mode provides two adjustable parameters that allow you to fine-tune the OCR recognition process:
- Block Size – Controls the size of the neighbourhood area used for calculating the threshold value. Larger values (e.g., 15–25) work better for text with gradual lighting changes, whilst smaller values (e.g., 7–13) are more effective for text with sharp contrast variations.
- C Value – Acts as a constant subtracted from the mean threshold. Positive values make the thresholding more conservative (less text detected but higher accuracy), whilst negative values make it more aggressive (more text detected but potentially more noise).
This mode proves invaluable when standard preprocessing fails to produce reliable results. By experimenting with these two parameters, you can often achieve superior OCR recognition compared to the ready-to-use modes, particularly in scenarios where backgrounds contain moving elements, varying illumination, or complex visual patterns that would otherwise interfere with text detection.
For optimal results, start with moderate values (Block Size: 11, C Value: 2) and adjust based on your specific content. Increase Block Size for larger text or gradual lighting changes, and adjust C Value to balance between capturing all text and reducing false positives.
-
OCR Confidence Threshold:
- Higher values (e.g., 80) give fewer but more accurate results.
- Lower values (e.g., 40) catch more text but may include errors.
-
Text Stability Threshold:
- How many identical OCR readings needed before translation.
- Higher values reduce flickering but increase delay.
- Set to 0 for instant translation (may cause flickering).
-
OCR Debugging:
- When enabled, shows captured images and OCR results in the Debugging tab.
- Useful for troubleshooting recognition issues.
-
Remove Trailing Garbage:
- When enabled, removes any text that appears after the last punctuation mark (period, exclamation point, or question mark).
- Useful for cleaning up OCR errors that often appear at the end of recognised text.
- Helps improve translation quality by removing random characters or partial words.
-
Scan Interval (ms):
- Time in milliseconds between screen captures.
- Lower values (e.g., 50ms) give faster response but use more CPU.
- Higher values (e.g., 500ms) use less CPU but respond slower.
-
Clear Translation Timeout (s):
- Time in seconds before clearing the translation when source text disappears.
- If the source area still contains text after the timeout period, the translation will remain visible.
- Only clears the translation when no text is detected in the source area.
- Set to 0 to keep translations visible indefinitely.
-
Clear Translation Cache:
- This button clears translations that are temporarily stored in the application's memory.
- The memory cache stores recent translations to avoid re-translating identical text immediately.
- Clearing this cache forces the application to retranslate all text, even if it was recently processed.
- This is different from file caching – it only affects translations stored in RAM during the current session.
- Useful when you want to see fresh translations or if cached results seem incorrect.
-
File Caching Options (API Translation Services Only):
-
Enable Google Translate file cache – Uses the same caching system as DeepL (see DeepL section for detailed explanation).
-
Enable DeepL file cache – Saves DeepL translations to disk files to reduce API calls.
- These options help reduce API usage and costs for paid services.
- The Clear File Caches button removes all cached translations stored in files.
- These settings do not apply to MarianMT since it works offline.
-
Debug Logging:
-
Disable Debug Log – Permanently disables debug logging to improve application performance.
- Once disabled, debug information will no longer be written to the log file.
- This can provide a noticeable performance improvement in resource-constrained environments.
- The setting persists between application sessions.
Appearance Settings
-
Source Area Colour – Background colour of the source capture overlay (customisable).
-
Target Area Colour – Background colour of the translation overlay (customisable).
-
Target Text Colour – Colour of the translated text (customisable).
-
Target Window Font Size – Size of the translated text.
Translation Methods
MarianMT (offline and free)
- No API key required – completely free to use.
- Works entirely offline once models are downloaded.
- Models are downloaded automatically when first used (~500MB per language pair).
- Configure by:
- Selecting MarianMT (offline and free) as Translation Model.
- Selecting a language pair from the MarianMT Model dropdown.
- Adjusting the Translation Beam Size (MarianMT) (higher values = better quality but slower).
MarianMT models are open-source neural machine translation systems that offer quite good translation quality. Whilst not quite reaching the standards of premium services like DeepL, they provide remarkably solid translations without any cost or internet requirement after the initial model download.
These models were originally designed for translating short, single sentences and would typically truncate longer passages. However, OCR Translator implements a clever workaround to this limitation. The application automatically splits longer texts into individual sentences and processes them in parallel using separate threads. These translated segments are then seamlessly stitched back together, ensuring you receive complete translations regardless of text length.
This approach offers several practical advantages:
- Complete privacy – your text never leaves your computer.
- No usage limits or subscription costs.
- Continues working during internet outages.
- No API latency – translations happen at the speed of your computer.
The Translation Beam Size (MarianMT) setting allows you to balance between speed and quality. Higher values (8–12) produce more refined translations but require more processing time, whilst lower values (1–4) prioritise speed over perfect phrasing.
DeepL API
- Requires a DeepL account and API key.
- Offers premium-quality translations but supports fewer languages.
- Regarded by many as the industry leader in translation quality.
- The DeepL API Free plan allows for the translation of 500,000 characters per month free of charge (as at May, 2025).
- Configure by:
- Selecting DeepL API as Translation Model.
- Entering your API key in the Settings tab (DeepL-specific setting).
- Choosing your preferred Quality option:
- Classic – Faster processing with excellent quality
- Next-gen – Slightly slower but potentially even better quality for some content
- If you don't have an API key, see the Installation Guide.
Quality Options
DeepL offers two quality modes to suit different needs. The Classic model provides fast, high-quality translations that work with all supported language pairs. The Next-gen model uses DeepL's latest translation technology, which can deliver even better results for certain types of content, though it processes slightly slower and may not support all language pairs.
If you select Next-gen and your chosen language pair isn't supported, the application will automatically fall back to Classic mode to ensure your translation continues working seamlessly. Both options deliver top-quality results that DeepL is renowned for.
DeepL File Caching System
OCR Translator implements a caching system for DeepL translations. Once a text segment has been translated, it's stored in the application's local cache (deepl_cache.txt
). When the same text appears again, the application retrieves the translation from the cache instead of sending another API request.
It's important to understand that this caching mechanism relies entirely on OCR quality. For a cache match to occur, the OCR'd text must be identical – down to the last character – with what is stored in the cache. Even a single character difference will result in a new API call and translation. This means the actual efficiency of the cache depends heavily on consistent OCR results.
The cache can be helpful for gamers in specific scenarios. For instance, if you're playing a game where static menu options or repeated dialogue appear in exactly the same font, size, and screen position, the OCR is more likely to produce identical results each time. However, if text appears with different backgrounds, lighting, or slight position shifts, OCR variations will likely trigger new translations.
For example, a game's Save Game button might consistently be recognised identically and benefit from caching, while dynamic dialogue with changing characters or backgrounds might produce slightly different OCR results each time, limiting cache effectiveness.
The cache persists between application sessions, but its practical benefit should be viewed as a helpful bonus rather than a major API-saving feature. The more consistent and clear the text presentation, the more likely you are to benefit from the caching system.
Google Translate API
- Requires a Google Cloud account and API key.
- Supports the widest range of languages.
- Good for general purpose translation with broad language coverage.
- Configure by:
- Selecting Google Translate API as Translation Model.
- Entering your API key in the Settings tab (Google Translate-specific setting).
- If you don't have an API key, see the Installation Guide.
Google Translate uses the same file caching system as DeepL. Please refer to the DeepL API section above for a detailed explanation of how the caching mechanism works, its dependencies on OCR quality, and its practical benefits and limitations. All the same considerations and notes apply to Google Translate's caching functionality.
Keyboard Shortcuts
These keyboard shortcuts are available:
Shortcut |
Function |
~ (tilde) |
Start/Stop Translation |
Alt+1 |
Toggle Source Window Visibility |
Alt+2 |
Toggle Translation Window Visibility |
Alt+S |
Save Settings |
Alt+C |
Clear Translation Cache |
Alt+L |
Clear Debug Log |
Note: When the application is stopped (translation inactive), the translation window will be hidden automatically. When the application is started, the translation window will appear automatically. You can manually override this behaviour using the Alt+2 shortcut at any time.
Troubleshooting
If you encounter issues:
- Check the Debugging tab for error messages and the application log.
- Enable OCR Debugging in Settings to see what's being captured and recognised in the Debugging tab. Also view the OCR Preview window (accessible in the Settings tab via the Preview button).
- Adjust settings as needed:
- Try different preprocessing modes, including Adaptive mode for challenging environments.
- Adjust confidence and stability thresholds.
- Verify source language is correct.
- Enable Remove Trailing Garbage to clean up OCR errors.
- Fine-tune Adaptive mode parameters if using challenging backgrounds.
- Consult the Troubleshooting Guide for common issues and solutions.
Tips and Best Practices
OCR Accuracy
For best OCR results:
- Capture clean, high-contrast text.
- Select appropriate source language.
- Adjust preprocessing mode to match text appearance – try Adaptive mode for difficult backgrounds.
- Resize the capture area to frame text closely but completely.
- Use a larger source area for more context if OCR is struggling.
- Enable Remove Trailing Garbage to clean up recognition artefacts.
- Adjust the confidence threshold to balance between capturing all text (lower values) and reducing errors (higher values).
- For small subtitles on changing backgrounds, experiment with Adaptive mode's Block Size and C Value parameters.
- Use a smaller source capture area.
- Click Disable Debug Log in the Home tab.
- Increase Scan Interval in the Settings tab to reduce CPU usage.
- Disable OCR Debugging in the Settings tab.
- Set the Image Preprocessing Mode to None in the Settings tab.
- For MarianMT:
- Lower Translation Beam Size (MarianMT) for faster translation.
- For DeepL or Google Translate:
- Enable file caching to improve performance with repetitive text.
Practical Applications
-
Games:
🎮
NOTE: OCR Translator may not work correctly with some games in fullscreen mode.
We recommend using borderless windowed mode, which is supported by most modern games.
- Position source overlay over game subtitles or dialogue.
- Place target overlay in a less intrusive area.
- Use the Clear Translation Timeout to keep translations visible as long as text is present on screen.
- Enable Remove Trailing Garbage to clean up OCR errors common in game text.
- For games with challenging subtitle backgrounds, experiment with Adaptive mode settings.
-
Videos:
- Capture subtitle area.
- Adjust preprocessing if subtitles are embedded – Adaptive mode often works well for varying video backgrounds.
- For streaming content, set stability threshold lower (0–1) for faster response.
-
Documents & PDFs:
- Resize source window to capture paragraphs.
- Use stability threshold of 1–2 to reduce flickering when scrolling.
- For scientific or technical documents, increase confidence threshold to reduce errors.
-
Applications:
- Position overlays to avoid covering important UI elements.
- Use hotkeys to quickly toggle translation when needed.
- For applications with changing content, adjust the scan interval for optimal performance.