Skip to main content

XPath

🔗 Original page — Source of this material


This is a flexible and powerful query language for selecting elements in XML or (X)HTML documents and for XSLT transformations by DOM. It is a standard created by the W3C consortium.

What is XPath for in ZennoPoster?

With XPath, you can create a more universal and robust data search algorithm that is less sensitive to website layout changes compared to ❗→ regular expressions. This query language allows you to significantly simplify parser logic and speed up development.

Testing queries as you create them

image-20210227-143255

For example, to get the names of events on the http://w3.org website, you can use the following expression:

//*[@id="w3c_home_upcoming_events"]/ul/li//a

Basic syntax

Paths

ExpressionDescription
.current context
.//recursive descent (zero or more levels down from the current context)
/html/bodyabsolute path
arelative path
//*everything in the current context
li//alinks that are “grandchildren” of li
//a|//buttonlinks and buttons (combines two node sets)

Relationships

ExpressionDescription
a/i/parent::pimmediate parent <p>
p/ancestor::\all ancestors
p/following-sibling::\all following siblings
p/preceding-sibling::\all preceding siblings
p/following::\all elements after except descendants
p/preceding::\all elements before except ancestors
p/descendant-or-self::\the context node and all its descendants
p/ancestor-or-self::\the context node and all its ancestors

Getting nodes

ExpressionDescription
/div/text()get text nodes
/div/text()[1]get the first text node

Element position

ExpressionDescription
a[1]first element
a[last()]last element
a[2]second link
a[position() <= 3]first 3 links
ul[li[1]=”OK”]list (UL) whose first item is 'OK'
tr[position() mod 2 = 1]odd elements
tr[position() mod 2 = 0]even elements
p/text()[2]second text node

Attributes and filters [] - means filtering elements

ExpressionDescription
input[@type=”text”]<input> tag with type attribute equal to text
input[@class='OK']<input> tag with class attribute equal to OK
p[not(@)]paragraphs with no attributes
[@style]all elements with a style attribute
a[. = “OK”]links with value “OK”
a/@idlink IDs
a/@\all link attributes
  • a[@id and @rel]

  • a[@id][@rel]

links that have both id and rel attributes
a[i or b]links containing an <i> or <b> element

Functions

Basic Xpath functions - http://www.w3.org/TR/xpath/#corelib

FunctionDescriptionExample
name()Returns the element name[name()='a']
string(val)Get the value of an attributestring(a[1]/@id)
substring(val, from, to)Cut part of a stringsubstring(@id, 1, 6)
substring-before(val, to)Return the part of val before the string tosubstring-before('12-May-1998', '-') = '12'
substring-after(val, from)Return the part of val after the string tosubstring-after('12-May-1998', '-') = 'May-1998'
string-length()Returns the number of characters in a string[string-length(text()) > 5]
count()Returns the number of elements
concat()Takes two or more strings and returns the concatenation of its arguments.
normalize-space()Just like Trim[normalize-space(text())='SEARCH']
starts-with()Starts with[starts-with(text(), 'SEARCH')]
contains()Contains[contains(name(), 'SEARCH')]
translate(val, from, to)Replaces the characters in the first string argument that appear in the second argument with the corresponding characters in the third argument.translate(«bar»,«abc»,«ABC»)

Grouping

ExpressionDescription
(table/tbody/tr)[last()]the last <tr> row of all tables
(//h1|//h2)[contains(text(), 'Text')]heading 1 or 2 containing 'Text'
a[//tr/@data-id=@data-id]all links whose data-id attribute matches the data-id attribute of a table row

https://ru.wikipedia.org/wiki/XPath https://www.w3schools.com/xml/xpath_syntax.asp