Automating Screenshot Collection for Language QA

Introduction

As the Automation Infrastructure Team Leader at my company, I’m often the go-to person when there’s a task that’s repetitive, frustrating, and practically begging to be automated.

Recently, the Translation team reached out with a challenge: automating their manual process of taking screenshots for various language versions of our website.

This was the very first step to - how the "Screenshot Assistant" toll was born...

I started by understanding their challenge - these screenshots are essential for their workflow. They send them to language QA specialists to evaluate translations against a schema of objective errors. Think of it as translation detective work: finding missing text, bad phrasing, or awkward UI layouts.

The problem? manually navigating through multiple pages and capturing screenshots in dozens of languages was eating up their time and patience faster than my SUV eats gas on a mountain road.

As we were talking, I thought about a cool project me and my team did a year ago. We built an in-house visual testing tool from scratch—an adventure that taught us a lot about crafting lightweight and efficient solutions. So, when I heard their request, I thought, “This is exactly the kind of problem that screams for a custom tool.” I could already picture it: lightweight, efficient, simple to use by non technical teams, and tailor-made for the task.

Breaking Down the Tasks and challenges

Inconsistent flows

One of the things the Translation team explained is that the screens, popups, elements, and even the flows they needed to test varied for each LQA iteration. So, since the flows are inconsistent, even though our Automation infra supports navigation to different flows and capturing screenshots, having an automated test suite is not suitable for their dynamic challenge. Furthermore, the ramp-up of teaching the Translation team to integrate with our existing Automation infra and tests just did not seem like a positive ROI.

What was needed here is a simple, flexible solution that lets them handle the dynamic nature of the task without any technical overhead. Something that didn’t require them to write code or maintain test cases but still gave them the power of automation where it mattered.

Proxy configuration and handling Top Level Domains

Being based in Israel, one of the first hurdles we encountered was avoiding automatic .co.il redirects when they weren’t needed. For language QA, it’s critical to access the exact version of the site intended for a specific audience, without the local DNS hijacking the process. Then there was the issue of proxies. Some pages required routing through a proxy to access correctly.

Language Selection

Another feature I wanted to include was flexibility in choosing which languages to test. I created an easy-to-use interface where the user can specify whether they want to capture screenshots for all available languages, or just a selected few. The language list is predefined, but it’s fully customizable, so if a new language needs to be added, it's just a matter of editing a simple configuration file.

Feature Flags

Finally, there was the matter of feature flags. Certain site features were still in development and hidden behind “toggles”. Testing them required the ability to enable them. The tool needed a way to handle these flags dynamically, ensuring the correct features were active when capturing screenshots.

The Implementation

When it came time to choose the tools for the job, I wanted to keep things as simple as possible while leveraging technologies I already knew well. After a quick evaluation, I decided on a stack that checked all the boxes: Ruby + Watir + Stich.

Ruby: We use Ruby for our Automation framework. It's lightweight, easy to work with, and has a syntax that's clean and readable. Perfect for building a tool that might eventually need tweaks or enhancements down the line from me or any of my team members.
Watir: A simple and reliable library for browser automation. It’s great for navigating web pages and interacting with elements without the overhead of more complex frameworks.
Stich: A handy gem (External ruby dependency) for stitching together screenshots when a single page is too large to fit into one capture. This made it perfect for handling full-page screenshots, ensuring we didn’t miss any critical content.
Simple installation: One of the key challenges was ensuring the tool was easy to use for non-technical team members who didn’t have the required frameworks and dependencies set up on their machines. So, I created a script that automatically installs everything needed to run the tool. With just a single command, the script installs the necessary gems and frameworks, making the setup process as simple as possible. This way, the Translation team doesn’t need to worry about managing dependencies or configuring the environment—they can get started with minimal effort.
Proxy configuration: for this purpose, I chose to use PAC files. A proxy auto-configuration (PAC) file is a text file that instructs a browser to forward traffic to a proxy server instead of directly to the destination server. I chose to pass these files when the browser is being initiated so that way I can control the traffic.

User Interaction

Once the application is launched via a simple command in the terminal, the tool opens a browser window and waits for user input. The user is prompted to fill in a few key details:

Feature Flags: If certain features need to be toggled on or off, the user can enter these flags here. No need for deep technical knowledge—just a few simple parameters.
Domain Settings: The user is asked to specify the domain to set the FeatureFlags and to navigate to the correct domain.

After these details are entered, the tool takes over and opens a browser window for the user to manually interact with and runs in the background awaiting instructions. Once the LQA reaches the desired state - they simply hit a keyboard button to take screenshots in all specified languages that are stored in the file system of the user. They are neatly organized as well, to avoid confusion and simplify the ordering process.

Key tool capabilities:

1. Language Navigation

The tool starts with a predefined list of languages, each represented by a URL parameter (e.g., ?lang=es for Spanish or ?lang=fr for French). It dynamically appends these parameters to the base URL, ensuring the tool does not need to interact with the UI at all.

2. Flexible flows Handling

Given that the screens and flows were highly dynamic, I avoided scripting rigid scenarios. Instead, the tool allows the user to navigate manually to the desired page after the language version loads. Once ready, they simply press Enter to capture a screenshot.

3. Full-Page Screenshots

Using Watir and Stich, the tool captures full-page screenshots, even for pages that require scrolling. This ensures that no content is missed, whether it’s a footer disclaimer or a dropdown menu halfway down the page.

4. Proxy Support

For regions where proxy access was necessary, the tool integrates seamlessly with proxy configurations. The user can specify proxy settings before launching the browser, ensuring they’re always testing the right regional version of the site.

5. Feature Flags

When certain features were gated behind flags, the tool allowed these to be passed dynamically To ensure all features can be tested.

6. Easy-to-Manage Screenshots

Screenshots are automatically saved with intuitive filenames, like page_language_code.png, and organized into folders for each language. This makes it easy for the Translation team to find and share the files they need.

Limitations

While the tool does a great job automating the process of taking screenshots for language QA, there are a few limitations to be aware of.

1. Popups and Modals

One of the main challenges with taking screenshots of dynamic content, like popups or modals, is that changing the language often triggers a page refresh or reload. In some cases, this will close any open modals or popups, which means the tool cannot capture them in their original state once the language is switched.

If the user needs a screenshot of a popup, they would have to make sure it’s open before changing the language or manually navigating to that part of the page after the language switch. This isn’t a huge problem, but it’s something to keep in mind when capturing content that relies on user interaction.

2. Dynamic Content

Another limitation is that, like with any tool that interacts with a live website, there’s always a risk that the content or UI layout might change. If a new feature is rolled out or a page layout is updated, the tool might need tweaks to keep up with these changes.

3. Complex Interactions

While the tool is great for simple navigation and screenshot capture, more complex interactions (like those requiring a series of clicks or hover actions) still need to be handled manually. If the Translation team needs to capture content that requires a multi-step interaction, the tool won’t automate that process—yet.

Search This Blog

In god we trust - The rest we test