British Flag

Game-Changing Translator User Manual

Copyright © 2025 Tomasz Kamiński
Last Updated: 31 July 2025

Game-Changing Translator Main Interface

Table of Contents

  1. Introduction
  2. Getting Started
  3. Main Interface
  4. Setting Up Translation Areas
  5. Settings Configuration
  6. Gemini OCR - Premium Text Recognition
  7. API Usage Monitoring
  8. Translation Methods
  9. Gemini API - Cost-Effective and Context-Aware Translation
  10. Keyboard Shortcuts
  11. Troubleshooting
  12. Tips and Best Practices

Introduction

Game-Changing Translator is a desktop application that automatically captures text from any area of your screen, performs optical character recognition (OCR), and translates the text in real-time. With its floating overlay windows, you can position the translation anywhere on your screen, making it perfect for translating games, videos, PDFs, or any application with text that you can't easily copy and paste.

Getting Started

Requirements

Before using Game-Changing Translator , ensure you have:

First Run

  1. Launch Game-Changing Translator by running the main.py script or the executable if you're using a compiled version.
  2. On first startup, the application loads with default settings and both source and target areas are hidden.
  3. Before starting translation, you need to:
  4. When you click Start, the translation window will automatically be shown.
  5. When you click Stop, the translation window will automatically be hidden.
  6. You can manually toggle the visibility of the translation window using Alt+2 at any time.

Main Interface

The main interface is organised into five tabs:

Home Tab

Home Tab

  1. Select Source Area (OCR) – Define the area where text will be captured.
  2. Select Target Area (Translation) – Define where the translated text will appear.
  3. Start/Stop – Toggle the translation process on/off.
  4. Hide/Show Source Window – Toggle visibility of the source capture area.
  5. Hide/Show Target Window – Toggle visibility of the translation target area.
  6. Clear Translation Cache – Clear translations stored in memory to force retranslation.
  7. Clear Debug Log – Clear the application log.
  8. Enable/Disable Debug Log – Disable debug logging for improved performance.
  9. Keyboard Shortcuts – List of available hotkeys.
  10. Status – Current application status.

Settings Tab

Settings Tab

Here you can configure:

  1. Translation Model – Select between different translation providers.
  2. OCR Model – Choose between Tesseract (offline) and Gemini API (online).
  3. Source Language – The language to detect with OCR and translate from (DeepL and Google Translate only).
  4. Target Language – The language to translate into (DeepL and Google Translate only).
  5. API Key – For DeepL or Google Translate.
  6. Quality – Choose between Classic (faster) or Next-gen (potentially better quality) models (DeepL only).
  7. MarianMT Options – For offline neural translation (MarianMT-specific).
  8. Gemini Options – For cost-effective AI translation with context awareness (Gemini-specific).
  9. Tesseract Path – Path to the Tesseract OCR executable (Tesseract only).
  10. Scan Interval (ms) – How frequently to capture the screen.
  11. Clear Translation Timeout (s) – Time before clearing translations when source text disappears.
  12. Text Stability Threshold – How many consistent readings needed before translation (Tesseract only).
  13. OCR Confidence Threshold – Minimum confidence for OCR text detection (Tesseract only).
  14. Image Preprocessing Mode – How to process images for OCR (Tesseract only).
  15. OCR Debugging – Option to show debug images and text in the Debugging tab (Tesseract only).
  16. Preview button – Opens OCR Preview window (Tesseract only).
  17. Remove Trailing Garbage – Option to remove text after the last punctuation mark (Tesseract only).
  18. Appearance Options – Colours and font sizes for overlays.
  19. File Caching Options – Settings to enable/disable caching for DeepL and Google Translate.

OCR Preview (Tesseract only)

OCR Preview

Clicking the Preview button on the Settings tab opens a separate OCR Preview window when using Tesseract OCR. This window displays:

  1. Processed Image (1:1 scale) – The preprocessed image being used for OCR recognition.
  2. Recognized Text – The text currently being recognised by the OCR engine.

This preview window is particularly useful for fine-tuning OCR settings and understanding why certain text might not be recognised properly. It can be moved and resized independently of the main application window.

API Usage Tab

API Usage Tab

This tab provides comprehensive monitoring and analysis of your Gemini API usage, including:

  1. Gemini OCR Statistics – Cost tracking and performance metrics for OCR operations.
  2. Gemini Translation Statistics – Word counts, costs, and efficiency metrics for translations.
  3. Combined API Statistics – Overall cost analysis and projections.
  4. DeepL Usage Tracker – Monitor free monthly limits for DeepL API.
  5. Export and Management Tools – Export statistics to CSV/text and copy to clipboard.

For detailed information about all available statistics and cost tracking features, see the API Usage Monitoring section.

Debugging Tab

Debugging Tab

This tab shows:

  1. Original Image – The raw captured image.
  2. Processed Image – The image after preprocessing for OCR.
  3. OCR Results – Text detected by OCR.
  4. Application Log – Running log of application events.
  5. Save OCR Images and Refresh Log – Buttons to save debug images and refresh the log.

About Tab

This tab provides basic information about the application.

Setting Up Translation Areas

Selecting the Source Area

  1. Click Select Source Area (OCR) button.
  2. Your screen will dim, and you'll see see a black cross.
  3. Click and drag to select the area containing text you want to translate.
  4. After selection, a semi-transparent overlay window appears at the selected location.
  5. This overlay are hidden by default when the application starts.
  6. This overlay can be:

Selecting the Target Area

  1. Click Select Target Area (Translation) button.
  2. Your screen will dim, and you'll see a black cross.
  3. Click and drag to select where you want translations to appear.
  4. After selection, a semi-transparent overlay window appears at the selected location.
  5. This overlay are hidden by default when the application starts.
  6. This overlay can be:

Settings Configuration

Translation Configuration

  1. Translation Model:

    Multiple Gemini models appear at the top of the dropdown, followed by traditional translation services:

  2. Source Language:

  3. Target Language:

OCR Configuration

  1. OCR Model:

    Multiple Gemini models appear at the top of the dropdown, followed by traditional OCR options:

  2. Tesseract Path (Tesseract only):

  3. Image Preprocessing Mode (Tesseract only):

  4. Adaptive Mode (Tesseract only):

    When you select Adaptive preprocessing mode, the system unlocks sophisticated adaptive thresholding capabilities that excel in challenging visual environments. This mode is particularly valuable when dealing with difficult conditions such as small subtitle text overlaid on dynamic, flickering backgrounds with constantly changing colours and lighting.

    Unlike the three standard preprocessing modes, Adaptive mode provides two adjustable parameters that allow you to fine-tune the OCR recognition process:

    This mode proves invaluable when standard preprocessing fails to produce reliable results. By experimenting with these two parameters, you can often achieve superior OCR recognition compared to the ready-to-use modes, particularly in scenarios where backgrounds contain moving elements, varying illumination, or complex visual patterns that would otherwise interfere with text detection.

    For optimal results, start with moderate values (Block Size: 11, C Value: 2) and adjust based on your specific content. Increase Block Size for larger text or gradual lighting changes, and adjust C Value to balance between capturing all text and reducing false positives.

  5. OCR Confidence Threshold (Tesseract only):

  6. Text Stability Threshold (Tesseract only):

  7. OCR Debugging (Tesseract only):

  8. Remove Trailing Garbage (Tesseract only):

Performance Settings

  1. Scan Interval (ms):

  2. Clear Translation Timeout (s):

  3. Clear Translation Cache:

  4. File Caching Options (API Translation Services Only):

  5. Debug Logging:

Appearance Settings

  1. Source Area Colour – Background colour of the source capture overlay (customisable).
  2. Target Area Colour – Background colour of the translation overlay (customisable).
  3. Target Text Colour – Colour of the translated text (customisable).
  4. Target Window Font Size – Size of the translated text.

Gemini OCR - Premium Text Recognition

Gemini OCR represents a revolutionary advancement in text recognition technology, providing superior accuracy for challenging subtitle scenarios where traditional OCR engines like Tesseract struggle. This premium feature leverages Google's advanced Gemini models to deliver exceptional OCR results with flexible model selection for optimal performance and cost efficiency.

Intelligent Model Selection

The application features flexible model selection for both OCR and translation operations, allowing you to optimise performance based on your specific use case:

Recommended Model Selection:

Advanced Configuration: Model availability and costs can be customised by editing the gemini_models.csv file in the resources directory. This allows you to add new models, update pricing, or modify which models are available for OCR versus translation operations as Google releases new Gemini models.

Challenging Subtitles Scenarios

Gemini OCR excels in scenarios where subtitles are difficult to recognise due to:

OCR Comparison Examples

Challenging subtitle example 1

Tesseract OCR Result: ~ Trust me, OD tite WE loca mS
Gemini OCR Result: Trust me, Oakmonters know a newcomer when they see one. We locals can tell.

Challenging subtitle example 2

Tesseract OCR Result: ' Paulie: Driv: show, Tom. Next stop's Bi the motel. 7 jj ie
Gemini OCR Result: Paulie: Drive before the cops show, Tom. Next stop's Bill at the motel.

Superior Premium Feature with Multiple Models

Gemini OCR is a premium feature that significantly outperforms traditional OCR methods through intelligent model selection. The application provides access to multiple Gemini models, each optimised for different scenarios:

Gemini 2.0 Models - Superior OCR Accuracy and Translation Quality:

Gemini 2.5 Models - Speed Optimised:

Performance and Cost:

Outstanding Cost-to-Quality Ratio

The available Gemini models deliver exceptionally fast and accurate OCR results that significantly surpass Tesseract or Paddle OCR. With intelligent model selection, you can optimise the cost-to-quality ratio for your specific use case whilst maintaining superior performance compared to both free and paid OCR solutions.

Cost Comparison (using Gemini 2.5 Flash-Lite pricing):

Best Practices with Gemini OCR

The API Usage tab (detailed in the next section) helps you monitor costs and estimate expenses for your specific use cases, ensuring you can optimise your OCR usage while maintaining excellent quality.

API Usage Monitoring

The API Usage tab provides comprehensive monitoring and cost analysis for Gemini API usage, helping you track expenses and optimise your API consumption for both OCR and translation services.

API Usage Tab

This tab displays detailed statistics across several categories:

📊 Gemini OCR Statistics

🔄 Gemini Translation Statistics

💰 Combined API Statistics

📈 DeepL Usage Tracker

Statistics Management

The tab includes several management options:

Important Note: Statistics are based on API_OCR_short_log.txt and API_TRA_short_log.txt files. Data will reset if these files are deleted.

Important: Cost tracking is provided for reference purposes only. You remain responsible for monitoring your own API usage and costs through Google's billing dashboard.

Translation Methods

Gemini API (Recommended)

Google's latest Gemini models provide exceptional translation quality with intelligent context awareness. This breakthrough technology combines premium translation quality with unprecedented affordability, making it ideal for translating massive projects like entire games for just a few dollars.

Key Advantages:

Gemini API is the recommended translation method for most users seeking the best balance of quality, intelligence, and affordability. For detailed configuration options, advanced features, and cost optimization strategies, see the complete Gemini API guide.

MarianMT (offline and free)

  1. No API key required – completely free to use.
  2. Works entirely offline once models are downloaded.
  3. Models are downloaded automatically when first used (~500MB per language pair).
  4. Configure by:

MarianMT models are open-source neural machine translation systems that offer quite good translation quality. Whilst not quite reaching the standards of premium services like DeepL, they provide remarkably solid translations without any cost or internet requirement after the initial model download.

These models were originally designed for translating short, single sentences and would typically truncate longer passages. However, Game-Changing Translator implements a clever workaround to this limitation. The application automatically splits longer texts into individual sentences and processes them efficiently using batch translation. All sentences are processed together in a single, optimized model inference call, then seamlessly stitched back together, ensuring you receive complete translations regardless of text length.

This approach offers several practical advantages:

The Translation Beam Size (MarianMT) setting allows you to balance between speed and quality. Higher values (8–12) produce more refined translations but require more processing time, whilst lower values (1–4) prioritise speed over perfect phrasing.

⚠️
NOTE: The English-to-Polish model takes a bit longer to install when first selected, as it needs to be downloaded and converted from a different source than the other MarianMT models.

DeepL API

  1. Requires a DeepL account and API key.
  2. Offers premium-quality translations but supports fewer languages.
  3. Regarded by many as the industry leader in translation quality.
  4. The DeepL API Free plan allows for the translation of 500,000 characters per month free of charge (as at May, 2025).
  5. DeepL usage is tracked in the API Usage tab for comprehensive monitoring alongside other API services.
  6. Configure by:

Quality Options

DeepL offers two quality modes to suit different needs. The Classic model provides fast, high-quality translations that work with all supported language pairs. The Next-gen model uses DeepL's latest translation technology, which can deliver even better results for certain types of content, though it processes slightly slower and may not support all language pairs.

If you select Next-gen and your chosen language pair isn't supported, the application will automatically fall back to Classic mode to ensure your translation continues working seamlessly. Both options deliver top-quality results that DeepL is renowned for.

DeepL File Caching System

Game-Changing Translator implements a caching system for DeepL translations. Once a text segment has been translated, it's stored in the application's local cache (deepl_cache.txt). When the same text appears again, the application retrieves the translation from the cache instead of sending another API request.

It's important to understand that this caching mechanism relies entirely on OCR quality. For a cache match to occur, the OCR'd text must be identical – down to the last character – with what is stored in the cache. Even a single character difference will result in a new API call and translation. This means the actual efficiency of the cache depends heavily on consistent OCR results.

The cache can be helpful for gamers in specific scenarios. For instance, if you're playing a game where static menu options or repeated dialogue appear in exactly the same font, size, and screen position, the OCR is more likely to produce identical results each time. However, if text appears with different backgrounds, lighting, or slight position shifts, OCR variations will likely trigger new translations.

For example, a game's Save Game button might consistently be recognised identically and benefit from caching, while dynamic dialogue with changing characters or backgrounds might produce slightly different OCR results each time, limiting cache effectiveness.

The cache persists between application sessions, but its practical benefit should be viewed as a helpful bonus rather than a major API-saving feature. The more consistent and clear the text presentation, the more likely you are to benefit from the caching system.

Google Translate API

  1. Requires a Google Cloud account and API key.
  2. Supports the widest range of languages.
  3. Good for general purpose translation with broad language coverage.
  4. Configure by:

Google Translate uses the same file caching system as DeepL. Please refer to the DeepL API section above for a detailed explanation of how the caching mechanism works, its dependencies on OCR quality, and its practical benefits and limitations. All the same considerations and notes apply to Google Translate's caching functionality.

Gemini API - Cost-Effective and Context-Aware Translation

Google's advanced Gemini models represent a breakthrough in AI translation technology, offering premium-quality translations with unprecedented cost-effectiveness. These advanced models combine intelligent context awareness with remarkable affordability, making it possible to translate massive gaming projects for a fraction of traditional costs.

Superior Translation Quality

Context Window Technology

Unlike traditional translation services that process each subtitle in isolation, Gemini API features a configurable sliding context window that maintains awareness of previous translations. This revolutionary approach ensures narrative coherence, improves grammar flow, and delivers translations that understand the broader context of conversations and storylines.

The context window can be configured to include 0-5 previous subtitles, allowing the AI to:

Example: Context-Aware Translation

This example demonstrates how context awareness helps maintain proper grammar when translating Czech to Polish:

Czech Original DeepL (No Context) Gemini (With Context) English Translation
A vodkaď se podle tebe teda známe? A skąd się znamy, według ciebie? A skąd niby się znamy? And how do we supposedly know each other?
Viděli jsme se přece u toho rybníka! Widzieliśmy się przecież nad stawem! Widzieliśmy się przecież nad tamtym stawem! We saw each other at that pond!
Jakýho rybníka? Já u žádnýho rybníka nebyla! Jakiego stawu? Nie byłam przy żadnym stawie! Nad jakim stawem? Ja nad żadnym stawem nie byłam! What pond? I wasn't at any pond!
Ale jo, byla! Ale tak, była! Ale tak, byłaś! But yes, you were!

Key Improvements with Context:

These examples clearly demonstrate how Gemini's context window helps maintain grammatical consistency and dialogue flow that would be impossible with sentence-by-sentence translation.

OCR Error Intelligence

One of Gemini's most impressive capabilities is its ability to interpret and correct OCR imperfections automatically. When text recognition produces garbled or incomplete results, Gemini's advanced language understanding can often deduce the intended meaning and provide clean, accurate translations without replicating OCR errors in the output.

Flexible Model Configuration

The application supports multiple Gemini models for both OCR and translation operations. You can select different models based on your specific needs: Gemini 2.0 models offer superior OCR accuracy for longer subtitles, whilst Gemini 2.5 models provide speed-optimised performance for rapidly changing content. Model selection and pricing can be customised by editing the gemini_models.csv file in the resources directory.

Example: OCR Error Correction

Here's a real-world example showing how Gemini handles OCR errors compared to DeepL when translating French to English:

OCR Input DeepL Output Gemini Output Analysis
Vraiment ? Really? Really? Clean OCR, both work well
| Vraiment ? | Really? Really? Gemini removes OCR artifact "|", DeepL replicates it

Exceptional Cost-Effectiveness

Real-World Cost Analysis

Gemini API offers extraordinary value for large translation projects. Even massive games like The Witcher 3, with hundreds of hours of dialogue and subtitles, can be translated for under $5 total cost. This remains true even when accounting for:

Cost Estimate: The Witcher 3 Translation

Here is a detailed cost analysis for translating The Witcher 3 subtitles using DeepL and Gemini 2.5 Flash-Lite:

Assumptions:

Cost Breakdown:

DeepL:

Gemini 2.5 Flash-Lite:

Service Estimated Cost (EUR) Estimated Cost (USD)
DeepL €135.00 $145.80
Gemini 2.5 Flash-Lite $2.16

Note: These are rough estimates. Actual costs depend on language pair, OCR accuracy, context settings, and cache effectiveness.

Disclaimer: Cost tracking is provided for reference purposes only. This is free software with no guarantees regarding cost accuracy. Users are responsible for monitoring their own API usage and costs through Google's billing dashboard.

Built-in Cost Tracking

Game-Changing Translator includes comprehensive cost monitoring specifically designed for Gemini API usage:

Detailed API Call Example

Here's a real example of how the API call logging works, showing the complete translation process:

=== GEMINI API CALL LOG ===
Timestamp: 2025-07-06 17:19:03
Language Pair: fr -> en
Original Text: Vous avez manipulé des civilisations entières, provoqué des décennies de guerre, détruit Ziost... et pris la fuite.
Vous allez me dire pourquoi. CALL DETAILS: - Message Length: 695 characters - Word Count: 119 words - Line Count: 9 lines COMPLETE MESSAGE CONTENT SENT TO GEMINI: ---BEGIN MESSAGE--- <Translate idiomatically the third subtitle from French to English. Return translation only.> FRENCH: C'était mon objectif. Le reste... n'était qu'un moyen de parvenir à mes fins. FRENCH: Vous dites que vous avez fait tout ce chemin pour me trouver. Me voici. Que voulez-vous ? FRENCH: Vous avez manipulé des civilisations entières, provoqué des décennies de guerre, détruit Ziost... et pris la fuite.
Vous allez me dire pourquoi. ENGLISH: That was my goal. The rest... was merely a means to an end. ENGLISH: You say you came all this way to find me. Here I am. What do you want? ENGLISH: ---END MESSAGE--- RESPONSE RECEIVED: Timestamp: 2025-07-06 17:19:03 Call Duration: 0.385 seconds ---BEGIN RESPONSE--- You manipulated entire civilizations, caused decades of war, destroyed Ziost... and fled. You're going to tell me why. ---END RESPONSE--- TOKEN & COST ANALYSIS (CURRENT CALL): - Translated Words: 22 - Exact Input Tokens: 173 - Exact Output Tokens: 26 - Input Cost: $0.00001730 - Output Cost: $0.00001040 - Total Cost for this Call: $0.00002770 CUMULATIVE TOTALS (INCLUDING THIS CALL, FROM LOG START): - Total Translated Words (so far): 18460 - Total Input Tokens (so far): 213723 - Total Output Tokens (so far): 30987 - Total Input Cost (so far): $0.02137230 - Total Output Cost (so far): $0.01239480 - Cumulative Log Cost: $0.03376710 ========================================

This detailed logging is saved in the Gemini_API_call_logs.txt file. In the Settings tab, you'll find Total Words and Total Cost fields that display cumulative figures based solely on this log file. If the file is cleared or deleted, these totals will reset accordingly.

Configuration and Setup

  1. API Key Setup – Requires a Google AI Studio or Google Cloud account with Gemini API access. Go to Google AI Studio and click the "Get API key" button to set up an API key for Gemini models.
  2. Model Selection – Uses Gemini 2.5 Flash-Lite for optimal cost-quality balance.
  3. Context Window – Choose between:
  4. Enable API Log – Optional detailed logging for cost analysis and debugging (API calls are saved in Gemini_API_call_logs.txt).
  5. Enable Gemini file cache – Enable to reduce API calls for repeated content (translations are saved in gemini_cache.txt).
  6. Temperature Setting – This setting can only be changed manually in the ocr_translator_config.ini file. It is set at 0.0 by default (gemini_model_temp = 0.0), which is the recommended setting for consistent, deterministic translations.

Performance Optimization

Intelligent Caching System

Gemini API benefits from the same file caching system as DeepL and Google Translate. When caching is enabled, identical text segments are stored locally and retrieved without additional API calls. However, cache effectiveness depends on OCR consistency - even small recognition variations will trigger new API requests.

Cost Optimization Strategies:

Comparison with Other Methods

Feature Gemini API DeepL API Google Translate MarianMT
Translation Quality Excellent + Context Excellent Good Average to Good
Cost (Large Projects) Very Low High High Free
Context Awareness ✅ Advanced ❌ None ❌ None ❌ None
OCR Error Handling ✅ Often removes errors ❌ Often replicates errors ❌ Often replicates errors ❌ Often replicates errors
Cost Tracking ✅ Built-in ✅ Free Usage Tracker ❌ External only N/A

Gaming and Large Project Applications

Ideal for Gaming Translation

Gemini API excels in gaming scenarios where context and narrative flow are crucial:

Keyboard Shortcuts

These keyboard shortcuts are available:

Shortcut Function
~ (tilde) Start/Stop Translation
Alt+1 Toggle Source Window Visibility
Alt+2 Toggle Translation Window Visibility
Alt+S Save Settings
Alt+C Clear Translation Cache
Alt+L Clear Debug Log

Note: When the application is stopped (translation inactive), the translation window will be hidden automatically. When the application is started, the translation window will appear automatically. You can manually override this behaviour using the Alt+2 shortcut at any time.

Troubleshooting

If you encounter issues:

  1. Check the Debugging tab for error messages and the application log.
  2. Enable OCR Debugging in Settings to see what's being captured and recognised in the Debugging tab. Also view the OCR Preview window (accessible in the Settings tab via the Preview button).
  3. Adjust settings as needed:
  4. Consult the Troubleshooting Guide for common issues and solutions.

Tips and Best Practices

OCR Accuracy

For best OCR results:

  1. Capture clean, high-contrast text.
  2. Select appropriate source language.
  3. Adjust preprocessing mode to match text appearance – try Adaptive mode for difficult backgrounds.
  4. Resize the capture area to frame text closely but completely.
  5. Use a larger source area for more context if OCR is struggling.
  6. Enable Remove Trailing Garbage to clean up recognition artefacts.
  7. Adjust the confidence threshold to balance between capturing all text (lower values) and reducing errors (higher values).
  8. For small subtitles on changing backgrounds, experiment with Adaptive mode's Block Size and C Value parameters.

Performance Optimisation

  1. Use a smaller source capture area.
  2. Click Disable Debug Log in the Home tab.
  3. Increase Scan Interval in the Settings tab to reduce CPU usage.
  4. Disable OCR Debugging in the Settings tab.
  5. Set the Image Preprocessing Mode to None in the Settings tab.
  6. For MarianMT:
  7. For DeepL or Google Translate:

Practical Applications

  1. Games:

    🎮
    NOTE: Game-Changing Translator may not work correctly with some games in fullscreen mode.
    We recommend using borderless windowed mode, which is supported by most modern games.
  2. Videos:

  3. Documents & PDFs:

  4. Applications: