Skip to main content

Viewing Page Text

🔗 Original page — source of this material


Description

With this tool, you can easily view the source code (Source), DOM model, and displayed text of the page loaded in the ❗→ browser.

Note

You can read about the difference between the DOM and the source code in the note Difference between Source and DOM

Note

If you're working in the Chrome engine, you can also use developer tools as an alternative

What is this used for?

This tool is useful when you need to better understand the structure of a page:

How do you open the window?

The button to enable this window is to the right of the browser’s address bar.

image-20211226-162058

How to use the window?

When you click the icon, the window opens:

image-20200815-142918

Selecting content

Here, you need to select what you want to view: DOM (default), source code, or visible text on the page (❗→ difference between Source and DOM).

Word wrap

When this option is enabled, if a line is too long, it will be wrapped to the next line instead of hidden outside the window boundary. Here's a screenshot of the same window but with this option enabled:

image-20200815-143611

Copy to Regex Builder

When you click this button, the ❗→ Regex Builder will open, and the window’s contents will be automatically copied there.

Usage Example

Let’s say you need to parse <meta> tags with the property attribute from a ZennoLab forum topic page. The ❗→ action builder can’t access them, since these tags aren’t displayed anywhere. Here’s what you do:

  • Go to the required page
  • Launch the code viewing window (in this case, you can use either DOM or source code; it won’t affect the result) and look for the required tags (there are several, but just one is shown here):

image-20200815-150735

All these tags have the same structure: they always start with <meta property=" and end with > in quotes, immediately after property, is the property name, and in the content attribute is the value.

  • Copy the content into the ❗→ Regex Builder using the button with the same name. Based on our analysis, we’ll create a regex: (?<=<meta\ property=)"([a-z:]+)"\s+content="(.*?)"(?=>)

  • Using the ❗→ Text Processing action and its Regex functionality, pull the values you need from the page code and save them in a table:

image-20200815-153331

A few notes on this screenshot:

  • { -Page.Dom- } — this variable stores the DOM of the tab. For source code, it’s { -Page.Source- }, for text — { -Page.Text- }. You can find others in the ❗→ variables window.
  • Why was the column with index zero excluded? The ❗→ regular expression used bracket grouping ((?<=<meta\ property=)"([a-z:]+)"\s+content="(.*?)"(?=>) — two groups highlighted in red). When testing in the Regex Builder, if you go to the Groups tab, you’ll notice three groups were found, even though we only have two: the very first group is the text of the full match, then come the groups you defined. Since numbering starts from zero, we exclude the column with index 0, not 1.

image-20200815-155656