Skip to main content

Data (tab operations)

🔗 Original page — Source of this material


Description

This action is designed to retrieve data from a page.

How do I add the action to my project?

Via the context menu Add actionTabsData

image-20200812-163225

Or use ❗→ Smart Search.

What is it used for?

  • Find and save the information you need from a page
  • Check if any values are present on a page
  • Parse text from a page
  • Get the page URL

How do I use the action?

image-20200812-173523

What to get

Choose the type of data you want to retrieve:

  • DOM - Document Object Model;
  • Source - source code of the page;
  • Text - visible text of the page;
  • URL - the URL from the address bar.

The difference between Source and Dom

Click here to expand

Source - the source code of the page received from the server. DOM - this is the object tree created by the browser in your computer's memory based on the source code (*Source).

To put it simply, the browser works as follows:

  1. You enter a URL in the address bar and press Enter.
  2. The browser sends a request to the server.
  3. The server returns the response in the form of the HTML source code of the page (*Source)
  4. Based on the source code, the browser builds the *DOM (Data Object Model — document object model)
  • handles errors (adds html, body, head tags, etc. if they’re missing)
  • closes unclosed tags
  • adds a <tbody> tag to tables if there isn’t one. According to the DOM, tables (<table>) should have a <tbody> tag, but in HTML it’s optional. (Keep this in mind when building ❗→ XPath and ❗→ regular expressions)
  • processes scripts on the page (which can add new elements to the page—even after it's fully loaded)
  1. Finally, the browser renders and shows you the web page content based on the DOM.

The DOM can contain information and elements that won’t be in the source code (Source), because the DOM includes content that can be inserted dynamically using JavaScript.

When working with requests (❗→ GET, ❗→ POST, and ❗→ other types of requests), you’re always dealing with the Source.

To view the Source and DOM in ProjectMaker, there are two tools:

image-20211128-134708

image-20211128-134743

Which tab

Select which tab to get data from:

  • *Active — the currently active tab;
  • *First — if there are several tabs, take the first one;
  • *By name — specify the name of the tab;
  • *By number — specify the tab number if there are several.

Process only the specified tags

If you need to process only one or several specific HTML tags, enable the checkbox and select the desired options.

image-20200812-170011

Parse result

If you need to parse the obtained result, you can specify a regular expression (Regex), set the number and indexes of matches, and choose where to save the result — to a variable or a table. You can select the right regular expression using the ❗→ Regex Tester.

Note

The control elements that appear when you enable the “Parse data” setting are the same as in “Text Processing – Regex” (you’ll find a more detailed description there).

image-20200812-170151

Tip

To get data from a page, there is a more convenient tool — Parse data

Example usage

Let’s get all the links on a page. Choose DOM or Source as the source, enable result parsing, and set a Regex expression:

(?<=href=")http.*?(?=")

Grab all matches and put the results in a list.

image-20200812-170606

As a result, you’ll get a list of all the links found on the page.