The Complete Guide to Finding Elements by Text with Selenium

Locating web elements by their text content is a fundamental yet surprisingly tricky skill in test automation. Mastering the art of reliable text-based locators separates the seasoned expert from the novice Selenium scripter.

In this comprehensive 2500+ word guide, I‘ll share my hard-earned knowledge for creating robust text locators that stand the test of time. You‘ll learn not just how to find elements by text, but also best practices to avoid the many pitfalls.

By the end, you‘ll level up your Selenium skills with advanced techniques you won‘t find in most tutorials. Let‘s get started!

Why Text Locators Matter

Before jumping into code, it‘s worth understanding why text-based locators are so important for test automation.

Locating elements by text creates stability. Unlike attributes that change frequently, the text content of buttons, labels and headings tends to be far more static. Once your script can reliably find the "Submit" button by its text, you‘re less likely to encounter randomly failing tests.

Text locators also make scripts more maintainable. If an element‘s ID or CSS selector changes in a UI redesign, the text itself often remains untouched. This saves you refactoring work down the road.

However, text locators have caveats. The same stability benefit can also become a disadvantage if you rely on substrings that later change. Brittle locators will break your tests.

You also can‘t locate every element by text – things like divs, images and spans lack inner text altogether. For these, attributes and other locators are necessary.

Overall, for clickable elements like buttons, links and menus, text locators boost reliability when done properly. Let‘s explore some pro tips and code examples.

Locating Exact Matches vs. Substrings

The first decision when using text for locators is whether to match full strings or substrings.

Full string matches are best for stability, while substring matches trade off some reliability for flexibility.

Here‘s an example of locating a button by its exact text:

# Exact match 
submit_button = driver.find_element(By.XPATH, "//button[text()=‘Submit Order‘]")

And substring matching:

# Substring match
submit_button = driver.find_element(By.XPATH, "//button[contains(text(), ‘Submit‘)]")

Matching the full text has benefits:

  • More resilient to changes – won‘t break if extra words are added
  • Avoids matching multiple elements accidentally
  • Leads to "single source of truth" locators

The weakness of full string matching is less flexibility. If the text changes even slightly – like capitalization or punctuation – it breaks.

Substrings provide more wiggle room but can also match unwanted elements. Changes in word order or text length may also ruin a partial match locator.

For maximum reliability, favor full string matches whenever possible. Use substrings sparingly with careful consideration of text variations.

XPath vs. CSS Selectors: A Text Matchup

For web apps, the two main options for text matching are XPath and CSS selectors. Which one should you use?

In my experience, XPath usually provides the most robust and flexible text locators. But CSS also has some notable advantages to consider.

Reasons to prefer XPath over CSS include:

  • Powerful text() method matches full strings
  • contains() handles partial matches
  • Supports advanced regular expressions
  • Full XPath spec has many useful functions

CSS selectors have benefits too:

  • Concise syntax with less verbosity than XPath
  • :contains pseudo-class for partial matches
  • :exact-text recently added for full matches
  • Faster performance according to some studies

For most text locator needs, I recommend starting with XPath due to the text() method and overall flexibility.

However, CSS deserves consideration for its performance gains and cleaner syntax. I tend to use a mix of both in my scripts.

Later we‘ll see examples of text matching in XPath and CSS. First, let‘s go over some key best practices.

10 Pro Tips for Reliable Text Locators

While finding elements by text is simple in concept, creating robust text-based locators requires some skill.

Here are my top 10 tips for Peer Reviewed Scientific Data in Clinical Trials of Cardiovascular Devices Na gold standard text locators that stand the test of time:

1. Favor Exact Matches Over Partial Matches

As mentioned earlier, full string matching creates more stable locators resistant to changes. Use text() in XPath or :exact-text in CSS for the best reliability.

2. Validate Locators Against Text Changes

When site content gets updated, it often breaks locators using substrings. Add validation checks to confirm your text locators still work after text changes.

3. Allow for Text Variations Like Spacing and Casing

Text rarely matches exactly every time. Use techniques like contains(), @placeholder, or regex to make locators more flexible.

4. Beware of Stale Element Exceptions

If the page reloads, your old text locator may go stale. Wrap finds in a try/catch block to handle these exceptions.

5. Implement Explicit Waits for Text

With dynamic sites, text can load after page load. Wait for specific text using Expected Conditions rather than hardcoded delays.

6. Handle Text in Shadow DOM Using Special Methods

Trendy sites use shadow DOM which hides text from standard locators. Learn shadow piercing techniques like find_element(By.CSS, "*")

7. Match the Visible Inner Text, Not HTML Text Content

What users see isn‘t always what‘s in the HTML source. Base locators on rendered text, not raw source.

8. Get All Matches, Then Filter to the Index You Want

Rather than locate a single match, find all elements and filter to the index you need like [1].

9. Prefetch Stale Elements to Refresh the References

For long-lived elements like navigation bars, call find_element before interacting to avoid stale exceptions.

10. Avoid Fragile Locators Relying on Substrings

Basing locators on substrings like ‘Submit‘ leads to brittle tests. Lean towards stable full string matching.

Now let‘s see some real-world examples of applying these best practices in Python code.

XPath Text Matching Examples

XPath is your Swiss Army knife for matching text with the contains() and text() methods.

For partial matches, use contains():

# Button with Submit substring 
submit_button = driver.find_element(By.XPATH, "//button[contains(text(), ‘Submit‘)]")

# Label containing Search text
search_label = driver.find_element(By.XPATH, "//label[contains(.,‘Search‘)]") 

To match full text, use the text() method:

# Exact match on search button text
search_btn = driver.find_element(By.XPATH, "//button[text()=‘Search Accounts‘]")

# Get welcome element only if text matches exactly  
welcome_header = driver.find_element(By.XPATH, "//h1[text()=‘Welcome to Acme‘]")

Note that text() matches the visible inner text, not the raw HTML text content which may differ.

You can also use regex with text methods for added flexibility:

# Button ending in Submit with regex
submit_regex = driver.find_element(By.XPATH, "//button[contains(text(), ‘Submit$‘)]")

# Match Ignore Case with regex  
no_caps = driver.find_element(By.XPATH, "//span[matches(text(), ‘^no caps$‘)]") 

With these examples, you can see the power of XPath for locating elements by inner text.

CSS Selector Examples for Text Matching

CSS selectors like XPath allow locating elements by text nodes using:contains, :exact-text, and attribute selectors.

Here‘s how to find elements with a text substring using:contains:

# :contains usage 
submit_btn = driver.find_element(By.CSS_SELECTOR, "button:contains(‘Submit‘)")

search_field = driver.find_element(By.CSS_SELECTOR, "input:contains(‘Search‘)")

And for exact text matching, CSS Level 4 added the :exact-text pseudo-class:

# Exact text match with :exact-text  
welcome_header = driver.find_element(By.CSS_SELECTOR, "h1:exact-text(‘Welcome‘)") 

login_link = driver.find_element(By.CSS_SELECTOR, "a:exact-text(‘Login‘)")

Attribute selectors like [placeholder] also allow text matching:

# Attribute placeholder contains search text
search_input = driver.find_element(By.CSS_SELECTOR, "input[placeholder*=‘Search‘]")

While XPath tends to offer more flexibility, CSS selectors provide a terse alternative for text matching.

Beyond XPath and CSS: Additional Text Matching Approaches

While XPath and CSS cover most use cases, a few other options exist for matching text:

Link text – Find links by exact displayed text using link_text:

login_link = driver.find_element(By.LINK_TEXT, "Login")

Partial link text – Useful when link text is dynamic or changes slightly:

account_link = driver.find_element(By.PARTIAL_LINK_TEXT, "Account") 

Tag and filter – Get all of an element type, then filter by text:

submit_button = [btn for btn in driver.find_elements(By.TAG_NAME, "button") if btn.text == "Submit"][0]

These provide some convenience methods in certain cases but won‘t cover every situation like XPath.

Now let‘s tackle some common challenges when locating by text…

Handling Stale Elements and Re-locating

A common pain point with text locators is stale element exceptions. This happens when the page reloads and your old locator is invalidated.

You‘ll see errors like:

StaleElementReferenceException: Message: stale element reference: element is not attached to the page document

To handle these, wrap your find in a try/catch block:

try:
  element = driver.find_element(By.XPATH, "//h1[text()=‘Welcome‘]")
except StaleElementReferenceException:
  print("Oops, the element went stale. Re-locating...")

  # Re-find the element
  element = driver.find_element(By.XPATH, "//h1[text()=‘Welcome‘]")

Now when the element goes stale, your script will simply re-locate it!

For faster re-location, you can also call .refetch() on stale elements. But catching the exception allows retry logic.

Dealing with Dynamic Text Using Waits

Another common pitfall is text changing before you attempt to locate it. With modern web apps, content loads dynamically via JavaScript.

To handle this, you need to use waits in your scripts:

Implicit waits – Set a global wait for all finds:

# Wait 10 seconds for each element lookup
driver.implicitly_wait(10)  

text_element = driver.find_element(By.XPATH, "//h1[text()=‘Welcome‘]")

Explicit waits – Target specific text with Expected Conditions:

from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)

# Wait for title text 
wait.until(EC.text_to_be_present_in_element((By.ID, "title"), "Welcome!"))

I recommend a sensible implicit wait of 10 seconds as a fallback, with explicit waits for specific text.

This handles both general page load delays and dynamic JavaScript rendered content.

Beyond Visible Text: Matching Text in Shadow DOM

The text matching techniques so far locate visible inner text that users see. But what about text hidden inside shadow DOM?

Trendy web frameworks like Angular, React and Vue use shadow DOM to encapsulate component internals. This hides the HTML from standard locators.

You can reveal and pierce the shadow DOM using the shadowRoot property:

# Get shadow root
root1 = driver.execute_script(‘return arguments[0].shadowRoot‘, element)

# Find text within shadow DOM
shadow_text = root1.find_element(By.CSS_SELECTOR, "span:contains(‘Hello‘)")

Another approach is using the * selector to traverse all elements:

# Match text across shadow boundary  
shadow_text = driver.find_element(By.CSS_SELECTOR, "*:contains(‘Hello‘)")

With these methods, you can reliably match text hidden inside shadow DOM!

Comparing Python Text Locating Approaches with Other Languages

All the examples so far use Python, the most popular language for browser automation. How does text matching compare in other languages?

Java – Supports the same XPath and CSS selector locating approaches. More verbose but very similar API.

C# – Provides matching methods via the WebDriver library. C# has great string manipulation features.

Ruby – Text matching works the same as Python but with a different syntax. XPath and CSS are supported.

JavaScript – Can accomplish text location using document.querySelector() and other DOM methods.

The concepts are highly transferable between languages. XPath and CSS provide the standard cross-language locators.

Key Takeaways and Best Practices

Let‘s recap the key lessons for creating robust text-based locators:

  • Favor full string matching over substrings for stability
  • Use explicit waits to handle delayed text
  • Re-locate stale text elements if the page changes
  • Allow for variations via regex, contains(), etc
  • Validate locators after text changes to catch regressions
  • Avoid fragile locators tied to substrings that may change
  • XPath tends to provide the most powerful and flexible locators

locators, While text-based locators have caveats, following these best practices helps avoid flaky tests and scripts breaking unexpectedly.

Mastering the art of finding elements by text takes time, but pays off in more stable and maintainable tests.

Conclusion

This wraps up my complete guide to locating elements by text in Selenium!

We covered:

  • Matching full strings vs substrings
  • XPath and CSS selector examples
  • Best practices for robust locators
  • Handling stale elements and dynamic text
  • Waiting for text and dealing with shadow DOM

Text-based locators require care to create, but reward you with super-powered tests that stand the test of time.

I hope these tips give you a solid foundation for leveling up your text locating skills. Happy testing!

Avatar photo

Written by Python Scraper

As an accomplished Proxies & Web scraping expert with over a decade of experience in data extraction, my expertise lies in leveraging proxies to maximize the efficiency and effectiveness of web scraping projects. My journey in this field began with a fascination for the vast troves of data available online and a passion for unlocking its potential.

Over the years, I've honed my skills in Python, developing sophisticated scraping tools that navigate complex web structures. A critical component of my work involves using various proxy services, including BrightData, Soax, Smartproxy, Proxy-Cheap, and Proxy-seller. These services have been instrumental in my ability to obtain multiple IP addresses, bypass IP restrictions, and overcome geographical limitations, thus enabling me to access and extract data seamlessly from diverse sources.

My approach to web scraping is not just technical; it's also strategic. I understand that every scraping task has unique challenges, and I tailor my methods accordingly, ensuring compliance with legal and ethical standards. By staying up-to-date with the latest developments in proxy technologies and web scraping methodologies, I continue to provide top-tier services in data extraction, helping clients transform raw data into actionable insights.