Skip to main content

Text Processing


Description

This action block is for various text manipulations. For example: processing scraped text, cleaning up garbage, or even translating it to another language – all this and much more is what this "cube" can do.

How to add to your project?

Right-click and select Add Action → Data → Text Processing:

Add to project

What the action block looks like

Add to project

The window is split into three sections:
1. Input string. Here you can put text, a variable, or a combination of them.
2. Actions to perform on the string, with settings.
3. Output string where your result is saved to a variable.

If you put the cursor in the input field and press Ctrl+Space, you'll see a dropdown with useful constants and project variables.

Add to project


String Escape (Escaping Characters)

This action escapes spaces and these characters: * + ? | { [ ( ) ^ . #. In other words, it adds a slash \ before each of those symbols.

This is often used for query processing. It also helps the regular expression handler see characters as literals, not as commands or metacharacters.


Regex (Text Processing with Regular Expressions)

Add to project

Regular expressions are super handy for parsing strings according to a specific pattern. This action can get not just the first matching value, but the whole group, and save the values to variables or a table.

Regex Input Field

This is where you enter your regular expression to search the text. Example: (?<=<title>).*(?=</title>).

You can use the Regular Expression Tester to help write regexes.

Add to project

Error on No Match

If this option is enabled, the action will return an error if the regular expression doesn't find anything in the text.

Note: if the regex returns an empty string, even if this setting is on the action will still complete successfully. For example, if there's nothing inside a title tag: <title></title>, the regex (?<=<title>).*(?=</title>) matches, but just returns an empty string, and the action is considered a success. But if there's no title at all, you'll get an error.

What to take

Add to project

First

Saves the first matched value to a variable.

All

Saves all matches to a list.

One Match

Saves only one match by its index (zero-based numbering).

You can also select Last or Random value.

Add to project

Match Numbers

Saves to a list only the matches whose indexes you specify as comma-separated (zero-based numbering).

To Variables

This feature is for working with group regular expressions.

Here's a detailed example to make it clearer.

Let's say you have this text:

21.01.2003, 11:34:00.9299
11.12.2013, 01:22:55.3021
04.01.2007, 08:00:06.0032

And you want to break it down. You could use this regex: (\d{2}).(\d{2}).(\d{4}), (\d{2}):(\d{2}):(\d{2}).(\d{4})

Here's what the result looks like in the Regex Tester:

Add to project

Now let's take the day, month, and year from the second line and save them to variables. Here's how:

Add to project

Match number is the line number. Since numbering starts at zero, write 1 for the second line.

Then specify the group number and variable to save each result to. Group 0 is the entire matched line: 11.12.2013, 01:22:55.3021. So for the individual bits, use groups: 1 for day, 2 for month, 3 for year.

To Table

Pretty similar to the previous feature, except here you save all results to a table, not just one. And you can exclude some groups from the final result.

Example here as well.

Let's use the same text:

21.01.2003, 11:34:00.9299
11.12.2013, 01:22:55.3021
04.01.2007, 08:00:06.0032

And the same regex: (\d{2}).(\d{2}).(\d{4}), (\d{2}):(\d{2}):(\d{2}).(\d{4})

Here's the result in the Regex Tester:

Add to project

For the final table, let's say you don't need seconds and milliseconds:

Add to project

Group 0 is the whole matched line, so skip it. Also ignore groups 6 and 7 (seconds and milliseconds).

Example of Using Regex

Here's how you could extract links using regular expressions that you build in the ** constructor**.

Goal: Get profile links for active users on the ZennoLab forum

Add to project

1. Using the Get Value action, get the HTML code that contains links to online users.
2. Add the Regex action. Build your pattern in the Regex Constructor.
3. In the action properties, put the html variable as input, and save results to the urls list.
4. After running, you'll get a list of unique IDs which you can use to build user profile URLs.


Spintax (Randomizing and Varying Text)

With this tool, you can automatically create synonymized texts. It uses curly braces and vertical slashes to mix things up in random order.

A simple spintax looks like: {option1|option2|option3}. When run, one of the three options is randomly picked. Spintax can get much fancier with multi-level nesting — so you can easily generate thousands of variations from one template.

Add to project

Extended Syntax

  • {Red|White|Blue} — the result will be one value, like White.
  • [ Red| White| Blue] — the values will be shuffled: White Blue Red.
  • [+_+Red|White|Blue] — the values will be shuffled but with a separator between them: White_Red_Blue.

You can nest as much as you like: [+{_|-}+Red|White|Blue {1|2}] = White-Blue 2-Red Special symbols can be escaped for clarity: [+\++Red|\[White\]|Blue] = [White]+Red+Blue


Split (Splitting Text)

Splitting is done using a specific symbol (delimiter). Afterward, you'll get an array out of your string. This is a simpler alternative to using RegExp for splitting by characters.

Delimiters

Here you enter which symbols to split the data by.

Add to project

Allow Empty Values

Easier to show with an example:

Add to project

Say you've got a line like: name;surname;sex;birth year. If you skip one value, like no gender: Andrey;Pavlov;;1988, then 1988 goes into the sex variable, not year, which might not be what you want.

To fix this, enable Allow empty values. Then you can have empty strings written to variables where needed.

Example of Split

Let's break down proxy lines into parts.

Bought proxies usually look like this: login:pass@host:port. So you're splitting by both : and @. Your action block settings might look like this:

Add to project


ToChar (Converting Integers to Unicode)

Every Unicode symbol has its own numeric code. So you can convert any number to its assigned symbol. For example, the code for ♛ is 9819.

Add to project


ToLower (To Lowercase)

This action makes letters lowercase depending on the setting you choose.

Add to project

All

Changes every uppercase letter in the text to lowercase.

WasNow
МЕНЯЕТ РЕГИСТР БУКВ НА НИЖНИЙменяет регистр букв на нижний

Word Start

Only the first character of every word is made lowercase.

WasNow
МЕНЯЕТ РЕГИСТР БУКВ НА НИЖНИЙмЕНЯЕТ рЕГИСТР бУКВ нА нИЖНИЙ

First Character

Only the first character of the first word is made lowercase.

WasNow
МЕНЯЕТ РЕГИСТР БУКВ НА НИЖНИЙмЕНЯЕТ РЕГИСТР БУКВ НА НИЖНИЙ

ToUpper (To Uppercase)

This action makes every letter uppercase as per the setting.

Add to project

All

Changes all lowercase letters to uppercase.

WasNow
текст в нижнем регистреТЕКСТ В НИЖНЕМ РЕГИСТРЕ

Word Start

Only the first letter of each word is capitalized.

WasNow
текст в нижнем регистреТекст В Нижнем Регистре

First Character

Only the first character of the first word is capitalized.

WasNow
текст в нижнем регистреТекст в нижнем регистре

Trim (Remove Characters at Start/End of String)

Use this function to remove unwanted characters at the start and/or end of a given string.

It often saves the day when you need to clear away extra spaces, line breaks or tabs left after parsing.

What to trim

Pick which characters to delete — choose a preset for all whitespace, or specify your own.

Where to trim

Decide where to delete characters:

  • Start of string
  • End
  • Start and End

UrlDecode (Decoding)

Decodes a string previously encoded with UrlEncode (described below).

Add to project

The effect is easiest to see with Cyrillic: Before: %D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82%2C%20%D0%BC%D0%B8%D1%80%21 After: Привет, мир!


UrlEncode (Encoding)

A URL only allows Latin letters, numbers, and a handful of punctuation marks. All other symbols in HTTP requests must be encoded with UrlEncode or the server won't understand your request.

Encode Only Variable Values

This option is useful when building HTTP requests because you only need to encode parameters, not the site address.

Settings might look like this:

Add to project

In {-Variable.keyword-} you have the text что такое urlencode, and after encoding, {-Variable.url-} will hold: https://www.google.com/search?q=%d1%87%d1%82%d0%be+%d1%82%d0%b0%d0%ba%d0%be%d0%b5+urlencode


To Variable

This action just saves everything you put in the input window — variables, text, symbols, project constants — into a single variable.

Add to project


To List

This action splits your text into lines using the chosen delimiter, then puts them in a list.

Add to project

Delimiter

  • Enter. New line.
  • Space
  • Custom text. You can specify a single character (like ;) or several (they'll be treated as one delimiter!).
  • Custom Regex. Use a regex as your splitter.

To Table

This action splits your text into lines and columns using the delimiters you specify, then fills out a table.

Delimiters

  • Enter. New line.
  • Space
  • Custom text. Specify as for lists (several symbols treated as one!).
  • Custom Regex. Use your own regexp.

Replace

This action replaces one string with another and saves the result to a variable.

Add to project

Specify the string or a regex if you select that mode.

What to Replace With

What text should replace what you found in the previous step.

Search Type

  • Text. Looks for the exact text you type.
  • Regex. Here, you give a pattern and it'll replace matches.

What to Replace

First

Only replaces the first found match.

All

Replaces all matches at once.

One Match

Add to project

In this case, only the specified match (or last match) is replaced.

Match Numbers

Enter which matches to replace as a comma-separated list (zero-based numbering).


Translation

This action translates strings from one language to another.

Add to project

Translation Service

Several options are available, which helps make text high-quality and unique.

API keys for these services must be added via Program Settings

Add to project

Source Language / Target Language

Here you specify which language to translate from and to.

You can set the language to "auto".

The system will try to detect the source language automatically, but can't guarantee the result.

You should enter the language code, not its full name. Examples:

  • English: en
  • Spanish: es
  • German: de
  • Russian: ru
  • And so on. Full list.
Every service has its own language codes.

For the most accurate list, always check your chosen service's docs.


JavaScript Preparation

This action processes a string for safe use in JavaScript. It mainly escapes quotes, apostrophes, and other special symbols so your text can be used in JavaScript and IF actions.

Before: <a href="https://zennolab.com/"> After: <a href=\"https://zennolab.com/\">

You can test your code with the JavaScript tester

Add to project


Substring

Takes a specific section out of a string. Give it the From and To character numbers.

Add to project

In this example, we're taking a substring from character 95 to end of string, getting "до другого.".


Transliteration

On rare occasions, you might still need to convert Cyrillic into Latin characters. This action does that.

Add to project