Parser component

Description

Performs parsing of any text structures. Regular expression method, JSON search method, built-in hypertext analyzer can be used. According to the search string, generates a response, either cutting the requested section or a set of them, or determines the number of elements in the specified section of the structure. Complex structures can be parsed by a sequence of "Parser" elements, each of which extracts some structure from the document and returns it for passing to the input of the next element.

The parsed document structure is cached in a specific instance of the script handler to speed up the handling of one large document of consecutive Parser elements».

Table 1. System Characteristics

Index

125

Short title

parse

Types of scenarios

All of them

Starter module

r_script_component_parse

Mode

Synchronous

Icon

125

Branching pattern

Branching, closing

Properties

Table 2. Properties
Specification Description

Title: Document
Code: data
Visibility: no
Default: — 

Argument with the content of the document to be parsed.

Title: Algorithm
Code: algoritm
Visibility: no
Default: `Regular Expressions'

Parsing Algorithm.
Possible options:

  • Parser json (json, 0) - Applies a search algorithm to a JSON structure (object or array). Examples of search queries below.

  • XML (xml, 1) parser - Applies a search algorithm to an XML document. Examples of search queries below.

  • Regular Expressions (regular, 2) - Applies the standard regular expression algorithm. Searches for items based on a search pattern. Generally detects multiple elements. To output the content, you must specify the element number and group number (if the template uses group capture).

  • HTML (html, 3) parser - Applies a search algorithm to an HTML document. RQuery syntax. Examples of search queries below. Documents differ from XML in the absence of a strict document structure, namely allowing the absence of closing tags.

Title: Search Query
Code: query
Visibility: no
Default: — 

String with the search query for the selected algorithm.

Title: Function
Code: functionJSON
Visibility: yes
Default: Number of elements

Function for the Parser algorithm JSON.
Possible options:

  • Content (content, 0) - Returns the detected element along with the content.

  • Restored content (restoreContent, 1) –

  • Number of elements (count, 2) - Returns the number of elements found.

  • Key List (keys, 3) - Returns a list of keys in the found object.

Title: Function
Code: functionXML
Visibility: yes
Default: Number of elements

Function for the Parser algorithm XML.
Possible options:

  • Document (textContent, 0) - Returns the selected element in its entirety along with the content.

  • Content (content, 1) - Returns the content of the selected element, excluding the element itself.

  • Number of elements (count, 2) - Returns the number of elements detected.

  • AttributeValue (attributeValue, 3) - Returns the value of the specified attribute of the selected element.

  • AttributeName (attributeKey, 4) - Returns the name of an attribute by its index in the selected element.

  • AttributeCount (attributeCount, 5) - Returns the number of attributes in the selected element.

Title: Function
Code: functionREGULAR
Visibility: yes
Default: `Number of groups'

Function for the Regular expressions' algorithm.
Possible options:

  • Content (content, 0) - Returns the contents of the specified group from the specified element. Count elements from 1 (default value), count groups from 0 (default value 1), with group zero being the full content of the element. The use of groupings (parentheses) in the search template entails the need to specify the correct value of the group of interest, specifying an inappropriate index will cause the component to terminate with an error.

  • Number of Elements (countElements, 1) - Returns the number of detected elements matching the search pattern.

  • Number of groups (countGroups, 2) - Returns the number of captured groups in the detected element with the specified index. The element number must be specified, the default is 1. If grouping is not used in the search pattern, the value is returned 0.

Title: Function
Code: functionHTML
Visibility: yes
Default: Number of elements

Function for the Parser algorithm HTML.
Possible options:

  • Document (textContent, 0) - Returns the selected element in its entirety along with the content.

  • Content (content, 1) - Returns the content of the selected element, excluding the element itself.

  • Number of elements (count, 2) - Returns the number of elements detected.

  • AttributeValue (attributeValue, 3) - Returns the value of the specified attribute of the selected element.

  • AttributeName (attributeKey, 4) - Returns the name of an attribute by its index in the selected element.

  • AttributeCount (attributeCount, 5) - Returns the number of attributes in the selected element.

Title: Element Number
Code: elementNumber
Visibility: yes
Default: — 

Parameter for the Regular Expressions algorithm. Specifies the index of the element in the list of elements detected by the pattern in the source document.

Title: Group number
Code: groupNumber
Visibility: yes
Default: — 

Parameter for the Regular Expressions algorithm.
Specifies the group number in the list of captured groups of the selected item.
Only relevant when specifying capture groups in the search query (highlighted with parentheses according to regular expression syntax).

For example, in the document mommy was washing the frame and lena was sitting the (.)a([^ ])* pattern will detect 7 elements, each with 3 groups available: 0, 1, 2. For element 4, the group 0 is rama, 1 is r, 2 is mu`.

Title: Attribute number
Code: attributeNumber
Visibility: yes
Default: — 

Parameter for algorithms xml and html.

Title: Attribute Name
Code: attributeName
Visibility: yes
Default: — 

Parameter for algorithms xml and html.

Title: Result to variable
Code: resultVariable
Visibility: no
Default: — 

Variable to save the result of the operation.

Title: More inquiries
Code: operations
Visibility: no
Default: — 

A list of sequentially executing additional queries and assigning the results to a variable.
Each value in the table represents a search query (analogous to a field query).

If the field is filled, the main query is executed first, and then all additional queries are executed with the same parameters.
The result of each is assigned to its corresponding variable in the table.
If any query fails, then:

  • The component will terminate on the failure branch.

  • The error text will be assigned to the error variable, prefixed with the operation number, e.g. "Operation 3. Error text". The main operation corresponds to the index 0.

  • Variables of previous executed queries will be populated, but subsequent queries will not.

Title: Error to variable
Code: errorVariable
Visibility: no
Default: — 

Variable to save the error.

Title: Transition
Code: transfer
Visibility: no
Default: — 

The component to which control is passed on successful completion of the operation.

Title: Transition, Error
Code: transferError
Visibility: no
Default: — 

Component to which control is passed when an error occurs.

Examples

Algorithm JSON

Overview

  • When parsing an array of objects, you can specify a filter in (key:value), only objects with the specified key-value pair will be used for further parsing. The filter can be on only one key-value pair. For example: "msgs"("msgSender": "ab"), "msgs" - contains an array of objects from which objects with the presence of the property will be selected "msgSender"="ab".

  • Numbering of array elements starts with 0.

Example of a document JSON:
{
  "result": "ok",
  "errormsg": "",
  "chat": [
    {
      "sessId": "uvajoqnx0qcpbjoflxr",
      "msgs": [
        {
          "msgId": 8255,
          "msgDt": 1491292390,
          "msgData": {
            "type": "text",
            "data": "Good afternoon. Select the question you are interested in."
          },
          "msgSender": "op"
        },
        {
          "msgId": 8256,
          "msgDt": 1491292391,
          "msgData": {
            "type": "buttons",
            "data": "Contact usr 6"
          },
          "msgSender": "ab"
        },
        {
          "msgId": 8257,
          "msgDt": 1491292392,
          "msgData": {
            "type": "buttons",
            "data": "Contact usr 8"
          },
          "msgSender": "ab"
        }
      ]
    }
  ]
}
Table 3. Examples of requests JSON
Search Query Function Result

chat/0/sessId

Contents

uvajoqnx0qcpbjoflxr

Number of elements

1

chat/0/msgs(msgSender:ab)/1/msgId

Contents

8257

Number of elements

1

chat/0/msgs(msgSender:ab)

Contents

 [
   {
     "msgId": 8256,
     "msgDt": 1491292391,
     "msgData": {
       "type": "buttons",
"data": "Contact usr 6"
     },
     "msgSender": "ab"
   },
   {
     "msgId": 8257,
     "msgDt": 1491292392,
     "msgData": {
       "type": "buttons",
"data": "Contact usr 8"
     },
     "msgSender": "ab"
   }
 ]

.

Number of elements

2

chat/0

Key List

["sessId","msgs"]

Algorithm XML

Example of a document XML:
<configuration version="16">
    <folder id="x.okteller.ru" label="x.okteller.ru" path="c:\rtx_mg3\Media\domains\x.okteller.ru" type="readwrite" rescanIntervalS="10" ignorePerms="false" autoNormalize="true">
        <device id="RXYYDPI-SVKNBMF-YADG7WD"></device>
        <minDiskFreePct>1</minDiskFreePct>
        <versioning></versioning>
        <copiers>0</copiers>
        <pullers>0</pullers>
        <hashers>0</hashers>
        <order>random</order>
    </folder>
    <folder id="okteller.ru" label="okteller.ru" path="c:\rtx_mg3\Media\domains\okteller.ru" type="readwrite" rescanIntervalS="10" ignorePerms="false" autoNormalize="true">
        <device id="RXYYDPI-SVKNBMF-YADG7WD"></device>
        <minDiskFreePct>1</minDiskFreePct>
        <versioning></versioning>
        <copiers>0</copiers>
        <pullers>0</pullers>
        <hashers>0</hashers>
        <order>random</order>
    </folder>
    <folder id="Common" label="Common" path="c:\rtx_mg3\Media\common" type="readwrite" rescanIntervalS="10" ignorePerms="false" autoNormalize="true">
        <device id="RXYYDPI-SVKNBMF-YADG7WD"></device>
        <minDiskFreePct>1</minDiskFreePct>
        <versioning></versioning>
        <copiers>0</copiers>
        <pullers>0</pullers>
        <hashers>0</hashers>
        <order>random</order>
    </folder>
    <device id="RXYYDPI-SVKNBMF-YADG7WD" name="Pavel" compression="metadata" introducer="false">
        <address>dynamic</address>
    </device>
    <gui enabled="true" tls="false" debugging="false">
        <address>127.0.0.1:8384</address>
        <apikey>jeopL9MghPvTmweKDeGcoXhwRtrdaVDP</apikey>
        <theme></theme>
    </gui>
    <options>
        <globalAnnounceEnabled>false</globalAnnounceEnabled>
        <localAnnounceEnabled>false</localAnnounceEnabled>
        <localAnnouncePort>21027</localAnnouncePort>
        <localAnnounceMCAddr></localAnnounceMCAddr>
        <maxSendKbps>0</maxSendKbps>
        <maxRecvKbps>0</maxRecvKbps>
        <reconnectionIntervalS>60</reconnectionIntervalS>
        <relaysEnabled>false</relaysEnabled>
        <relayReconnectIntervalM>10</relayReconnectIntervalM>
        <startBrowser>false</startBrowser>
        <natEnabled>false</natEnabled>
        <natLeaseMinutes>60</natLeaseMinutes>
        <natRenewalMinutes>30</natRenewalMinutes>
        <natTimeoutSeconds>10</natTimeoutSeconds>
    </options>
</configuration>
Table 4. Examples of requests XML
Search Query Function Result

"configuration"/0/"folder"

Document

<folder id="Common" label="Common" path="c:\rtx_mg3\Media\common" type="readwrite" rescanIntervalS="10" ignorePerms="false" autoNormalize="true">
    <device id="RXYYDPI-SVKNBMF-YADG7WD"/>
    <minDiskFreePct>1</minDiskFreePct>
    <versioning/>
    <copiers>0</copiers>
    <pullers>0</pullers>
    <hashers>0</hashers>
    <order>random</order>
</folder>
<folder id="rootdomain.ru" label="rootdomain.ru" path="c:\rtx_mg3\Media\domains\rootdomain.ru" type="readwrite" rescanIntervalS="10" ignorePerms="false" autoNormalize="true">
    <device id="RXYYDPI-SVKNBMF-YADG7WD"/>
    <minDiskFreePct>1</minDiskFreePct>
    <versioning/>
    <copiers>0</copiers>
    <pullers>0</pullers>
    <hashers>0</hashers>
    <order>random</order>
</folder>
<folder id="x.rootdomain.ru" label="x.rootdomain.ru" path="c:\rtx_mg3\Media\domains\x.rootdomain.ru" type="readwrite" rescanIntervalS="10" ignorePerms="false" autoNormalize="true">
    <device id="RXYYDPI-SVKNBMF-YADG7WD"/>
    <minDiskFreePct>1</minDiskFreePct>
    <versioning/>
    <copiers>0</copiers>
    <pullers>0</pullers>
    <hashers>0</hashers>
    <order>random</order>
</folder>

.

"configuration"/0/"folder"("id":"x.rootdomain.ru")/0/"device"/0/

Document

<device id="RXYYDPI-SVKNBMF-YADG7WD"/>

.

"configuration"/0/"folder"("id":"x.rootdomain.ru")

Document

<folder id="x.rootdomain.ru" label="x.rootdomain.ru" path="c:\rtx_mg3\Media\domains\x.rootdomain.ru" type="readwrite" rescanIntervalS="10" ignorePerms="false" autoNormalize="true">
    <device id="RXYYDPI-SVKNBMF-YADG7WD"/>
    <minDiskFreePct>1</minDiskFreePct>
    <versioning/>
    <copiers>0</copiers>
    <pullers>0</pullers>
    <hashers>0</hashers>
    <order>random</order>
</folder>

.

Contents

<device id="RXYYDPI-SVKNBMF-YADG7WD"/>
<minDiskFreePct>1</minDiskFreePct>
<versioning/>
<copiers>0</copiers>
<pullers>0</pullers>
<hashers>0</hashers>
<order>random</order>

.

Attribute Value.

Attribute Name: "type"

readwrite

Attribute Name.

Attribute Number: 1

id

Number of attributes

7

Algorithm HTML

Overview

  • Search by tag name in double quotes, e.g. "br".

  • The sequence of tags is formed by a delimiter |, for example "html"|"head"|"title".

  • Getting one of the same tags is done by adding an index after the tag delimiter (starting with zero), e.g. "html"|"br"|1|"title".

  • By default, the transition to searching for the next tag occurs with the selection of the null tag.

  • Request "html"|0|"head"|0|"title"|0 equivalent to a request "html"|"head"|"title", i.e. a query by tag name always returns a list.

  • The query can be built by indexes, e.g. "0|0|1".

  • Brackets are used to refer to tag attributes.

  • Request "html"|"head"|("charset") will give a list of parent tags "head" that contain the attribute "charset".

  • Request "html"|"head"|("type="test/css") outputs a list of parent tags "head" in which the "type" attribute is equal to "test/css".

Example of a document HTML:
<HTML>
    <HEAD>
         <META a="1" b="2"/>
    </HEAD>
    <BODY>
        <p id="x1" class="abc">abc abc
        <a href="http://asdf.ru">link</a> <br>
        </p>
        <br><br>
        <p id="x2" class="abc" v="123">zxcv zxcv</p>
    </BODY>
</HTML>
Table 5. Examples of requests HTML
Search Query Function Result

("id"="x1")

Document

<p id="x1" class="abc">abc abc <a href="http://asdf.ru">link</a> <br></p>

.

Contents

abc abc <a href="http://asdf.ru">link</a> <br>

.

Number of elements

1

Attribute Value.
Attribute Name: id

x1

"META"

Number of attributes

2

"p"

Document

<p id="x1" class="abc">abc abc <a href="http://asdf.ru">link</a> <br></p>

.

"p"|0

Document

<p id="x1" class="abc">abc abc <a href="http://asdf.ru">link</a> <br></p>

.

"p"|1

Document

<p id="x2" class="abc" v="123">zxcv zxcv</p>

.

"BODY"|"p"|("v")

Contents

zxcv zxcv

See also