8889841cREADME.md 0000644 00000033662 15050224513 0006032 0 ustar 00 # DiDOM
[](https://travis-ci.com/Imangazaliev/DiDOM)
[](https://packagist.org/packages/imangazaliev/didom)
[](https://packagist.org/packages/imangazaliev/didom)
[](https://packagist.org/packages/imangazaliev/didom)
[README на русском](README-RU.md)
DiDOM - simple and fast HTML parser.
## Contents
- [Installation](#installation)
- [Quick start](#quick-start)
- [Creating new document](#creating-new-document)
- [Search for elements](#search-for-elements)
- [Verify if element exists](#verify-if-element-exists)
- [Search in element](#search-in-element)
- [Supported selectors](#supported-selectors)
- [Output](#output)
- [Working with elements](#working-with-elements)
- [Creating a new element](#creating-a-new-element)
- [Getting the name of an element](#getting-the-name-of-an-element)
- [Getting parent element](#getting-parent-element)
- [Getting sibling elements](#getting-sibling-elements)
- [Getting the child elements](#getting-the-child-elements)
- [Getting document](#getting-document)
- [Working with element attributes](#working-with-element-attributes)
- [Comparing elements](#comparing-elements)
- [Adding a child element](#adding-a-child-element)
- [Replacing element](#replacing-element)
- [Removing element](#removing-element)
- [Working with cache](#working-with-cache)
- [Miscellaneous](#miscellaneous)
- [Comparison with other parsers](#comparison-with-other-parsers)
## Installation
To install DiDOM run the command:
composer require imangazaliev/didom
## Quick start
```php
use DiDom\Document;
$document = new Document('http://www.news.com/', true);
$posts = $document->find('.post');
foreach($posts as $post) {
echo $post->text(), "\n";
}
```
## Creating new document
DiDom allows to load HTML in several ways:
##### With constructor
```php
// the first parameter is a string with HTML
$document = new Document($html);
// file path
$document = new Document('page.html', true);
// or URL
$document = new Document('http://www.example.com/', true);
```
The second parameter specifies if you need to load file. Default is `false`.
Signature:
```php
__construct($string = null, $isFile = false, $encoding = 'UTF-8', $type = Document::TYPE_HTML)
```
`$string` - an HTML or XML string or a file path.
`$isFile` - indicates that the first parameter is a path to a file.
`$encoding` - the document encoding.
`$type` - the document type (HTML - `Document::TYPE_HTML`, XML - `Document::TYPE_XML`).
##### With separate methods
```php
$document = new Document();
$document->loadHtml($html);
$document->loadHtmlFile('page.html');
$document->loadHtmlFile('http://www.example.com/');
```
There are two methods available for loading XML: `loadXml` and `loadXmlFile`.
These methods accept additional [options](http://php.net/manual/en/libxml.constants.php):
```php
$document->loadHtml($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$document->loadHtmlFile($url, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$document->loadXml($xml, LIBXML_PARSEHUGE);
$document->loadXmlFile($url, LIBXML_PARSEHUGE);
```
## Search for elements
DiDOM accepts CSS selector or XPath as an expression for search. You need to path expression as the first parameter, and specify its type in the second one (default type is `Query::TYPE_CSS`):
##### With method `find()`:
```php
use DiDom\Document;
use DiDom\Query;
...
// CSS selector
$posts = $document->find('.post');
// XPath
$posts = $document->find("//div[contains(@class, 'post')]", Query::TYPE_XPATH);
```
If the elements that match a given expression are found, then method returns an array of instances of `DiDom\Element`, otherwise - an empty array. You could also get an array of `DOMElement` objects. To get this, pass `false` as the third parameter.
##### With magic method `__invoke()`:
```php
$posts = $document('.post');
```
**Warning:** using this method is undesirable because it may be removed in the future.
##### With method `xpath()`:
```php
$posts = $document->xpath("//*[contains(concat(' ', normalize-space(@class), ' '), ' post ')]");
```
You can do search inside an element:
```php
echo $document->find('nav')[0]->first('ul.menu')->xpath('//li')[0]->text();
```
### Verify if element exists
To verify if element exist use `has()` method:
```php
if ($document->has('.post')) {
// code
}
```
If you need to check if element exist and then get it:
```php
if ($document->has('.post')) {
$elements = $document->find('.post');
// code
}
```
but it would be faster like this:
```php
if (count($elements = $document->find('.post')) > 0) {
// code
}
```
because in the first case it makes two queries.
## Search in element
Methods `find()`, `first()`, `xpath()`, `has()`, `count()` are available in Element too.
Example:
```php
echo $document->find('nav')[0]->first('ul.menu')->xpath('//li')[0]->text();
```
#### Method `findInDocument()`
If you change, replace, or remove an element that was found in another element, the document will not be changed. This happens because method `find()` of `Element` class (a, respectively, the `first ()` and `xpath` methods) creates a new document to search.
To search for elements in the source document, you must use the methods `findInDocument()` and `firstInDocument()`:
```php
// nothing will happen
$document->first('head')->first('title')->remove();
// but this will do
$document->first('head')->firstInDocument('title')->remove();
```
**Warning:** methods `findInDocument()` and `firstInDocument()` work only for elements, which belong to a document, and for elements created via `new Element(...)`. If an element does not belong to a document, `LogicException` will be thrown;
## Supported selectors
DiDom supports search by:
- tag
- class, ID, name and value of an attribute
- pseudo-classes:
- first-, last-, nth-child
- empty and not-empty
- contains
- has
```php
// all links
$document->find('a');
// any element with id = "foo" and "bar" class
$document->find('#foo.bar');
// any element with attribute "name"
$document->find('[name]');
// the same as
$document->find('*[name]');
// input field with the name "foo"
$document->find('input[name=foo]');
$document->find('input[name=\'bar\']');
$document->find('input[name="baz"]');
// any element that has an attribute starting with "data-" and the value "foo"
$document->find('*[^data-=foo]');
// all links starting with https
$document->find('a[href^=https]');
// all images with the extension png
$document->find('img[src$=png]');
// all links containing the string "example.com"
$document->find('a[href*=example.com]');
// text of the links with "foo" class
$document->find('a.foo::text');
// address and title of all the fields with "bar" class
$document->find('a.bar::attr(href|title)');
```
## Output
### Getting HTML
##### With method `html()`:
```php
$posts = $document->find('.post');
echo $posts[0]->html();
```
##### Casting to string:
```php
$html = (string) $posts[0];
```
##### Formatting HTML output
```php
$html = $document->format()->html();
```
An element does not have `format()` method, so if you need to output formatted HTML of the element, then first you have to convert it to a document:
```php
$html = $element->toDocument()->format()->html();
```
#### Inner HTML
```php
$innerHtml = $element->innerHtml();
```
Document does not have the method `innerHtml()`, therefore, if you need to get inner HTML of a document, convert it into an element first:
```php
$innerHtml = $document->toElement()->innerHtml();
```
### Getting XML
```php
echo $document->xml();
echo $document->first('book')->xml();
```
### Getting content
```php
$posts = $document->find('.post');
echo $posts[0]->text();
```
## Creating a new element
### Creating an instance of the class
```php
use DiDom\Element;
$element = new Element('span', 'Hello');
// Outputs "Hello"
echo $element->html();
```
First parameter is a name of an attribute, the second one is its value (optional), the third one is element attributes (optional).
An example of creating an element with attributes:
```php
$attributes = ['name' => 'description', 'placeholder' => 'Enter description of item'];
$element = new Element('textarea', 'Text', $attributes);
```
An element can be created from an instance of the class `DOMElement`:
```php
use DiDom\Element;
use DOMElement;
$domElement = new DOMElement('span', 'Hello');
$element = new Element($domElement);
```
### Using the method `createElement`
```php
$document = new Document($html);
$element = $document->createElement('span', 'Hello');
```
## Getting the name of an element
```php
$element->tag;
```
## Getting parent element
```php
$document = new Document($html);
$input = $document->find('input[name=email]')[0];
var_dump($input->parent());
```
## Getting sibling elements
```php
$document = new Document($html);
$item = $document->find('ul.menu > li')[1];
var_dump($item->previousSibling());
var_dump($item->nextSibling());
```
## Getting the child elements
```php
$html = '
FooBar
';
$document = new Document($html);
$div = $document->first('div');
// element node (DOMElement)
// string(3) "Bar"
var_dump($div->child(1)->text());
// text node (DOMText)
// string(3) "Foo"
var_dump($div->firstChild()->text());
// comment node (DOMComment)
// string(3) "Baz"
var_dump($div->lastChild()->text());
// array(3) { ... }
var_dump($div->children());
```
## Getting document
```php
$document = new Document($html);
$element = $document->find('input[name=email]')[0];
$document2 = $element->getDocument();
// bool(true)
var_dump($document->is($document2));
```
## Working with element attributes
#### Creating/updating an attribute
##### With method `setAttribute`:
```php
$element->setAttribute('name', 'username');
```
##### With method `attr`:
```php
$element->attr('name', 'username');
```
##### With magic method `__set`:
```php
$element->name = 'username';
```
#### Getting value of an attribute
##### With method `getAttribute`:
```php
$username = $element->getAttribute('value');
```
##### With method `attr`:
```php
$username = $element->attr('value');
```
##### With magic method `__get`:
```php
$username = $element->name;
```
Returns `null` if attribute is not found.
#### Verify if attribute exists
##### With method `hasAttribute`:
```php
if ($element->hasAttribute('name')) {
// code
}
```
##### With magic method `__isset`:
```php
if (isset($element->name)) {
// code
}
```
#### Removing attribute:
##### With method `removeAttribute`:
```php
$element->removeAttribute('name');
```
##### With magic method `__unset`:
```php
unset($element->name);
```
## Comparing elements
```php
$element = new Element('span', 'hello');
$element2 = new Element('span', 'hello');
// bool(true)
var_dump($element->is($element));
// bool(false)
var_dump($element->is($element2));
```
## Appending child elements
```php
$list = new Element('ul');
$item = new Element('li', 'Item 1');
$list->appendChild($item);
$items = [
new Element('li', 'Item 2'),
new Element('li', 'Item 3'),
];
$list->appendChild($items);
```
## Adding a child element
```php
$list = new Element('ul');
$item = new Element('li', 'Item 1');
$items = [
new Element('li', 'Item 2'),
new Element('li', 'Item 3'),
];
$list->appendChild($item);
$list->appendChild($items);
```
## Replacing element
```php
$element = new Element('span', 'hello');
$document->find('.post')[0]->replace($element);
```
**Waning:** you can replace only those elements that were found directly in the document:
```php
// nothing will happen
$document->first('head')->first('title')->replace($title);
// but this will do
$document->first('head title')->replace($title);
```
More about this in section [Search for elements](#search-for-elements).
## Removing element
```php
$document->find('.post')[0]->remove();
```
**Warning:** you can remove only those elements that were found directly in the document:
```php
// nothing will happen
$document->first('head')->first('title')->remove();
// but this will do
$document->first('head title')->remove();
```
More about this in section [Search for elements](#search-for-elements).
## Working with cache
Cache is an array of XPath expressions, that were converted from CSS.
#### Getting from cache
```php
use DiDom\Query;
...
$xpath = Query::compile('h2');
$compiled = Query::getCompiled();
// array('h2' => '//h2')
var_dump($compiled);
```
#### Cache setting
```php
Query::setCompiled(['h2' => '//h2']);
```
## Miscellaneous
#### `preserveWhiteSpace`
By default, whitespace preserving is disabled.
You can enable the `preserveWhiteSpace` option before loading the document:
```php
$document = new Document();
$document->preserveWhiteSpace();
$document->loadXml($xml);
```
#### `count`
The `count ()` method counts children that match the selector:
```php
// prints the number of links in the document
echo $document->count('a');
```
```php
// prints the number of items in the list
echo $document->first('ul')->count('li');
```
#### `matches`
Returns `true` if the node matches the selector:
```php
$element->matches('div#content');
// strict match
// returns true if the element is a div with id equals content and nothing else
// if the element has any other attributes the method returns false
$element->matches('div#content', true);
```
#### `isElementNode`
Checks whether an element is an element (DOMElement):
```php
$element->isElementNode();
```
#### `isTextNode`
Checks whether an element is a text node (DOMText):
```php
$element->isTextNode();
```
#### `isCommentNode`
Checks whether the element is a comment (DOMComment):
```php
$element->isCommentNode();
```
## Comparison with other parsers
[Comparison with other parsers](https://github.com/Imangazaliev/DiDOM/wiki/Comparison-with-other-parsers-(1.0))
composer.lock 0000644 00000107075 15050224513 0007254 0 ustar 00 {
"_readme": [
"This file locks the dependencies of your project to a known state",
"Read more about it at https://getcomposer.org/doc/01-basic-usage.md#composer-lock-the-lock-file",
"This file is @generated automatically"
],
"hash": "7d413af5e53b1204b6c430dbd32ca728",
"content-hash": "fa5a5b325a0458a9fe05c67fb6b0e719",
"packages": [],
"packages-dev": [
{
"name": "doctrine/instantiator",
"version": "1.0.5",
"source": {
"type": "git",
"url": "https://github.com/doctrine/instantiator.git",
"reference": "8e884e78f9f0eb1329e445619e04456e64d8051d"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/doctrine/instantiator/zipball/8e884e78f9f0eb1329e445619e04456e64d8051d",
"reference": "8e884e78f9f0eb1329e445619e04456e64d8051d",
"shasum": ""
},
"require": {
"php": ">=5.3,<8.0-DEV"
},
"require-dev": {
"athletic/athletic": "~0.1.8",
"ext-pdo": "*",
"ext-phar": "*",
"phpunit/phpunit": "~4.0",
"squizlabs/php_codesniffer": "~2.0"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.0.x-dev"
}
},
"autoload": {
"psr-4": {
"Doctrine\\Instantiator\\": "src/Doctrine/Instantiator/"
}
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"MIT"
],
"authors": [
{
"name": "Marco Pivetta",
"email": "ocramius@gmail.com",
"homepage": "http://ocramius.github.com/"
}
],
"description": "A small, lightweight utility to instantiate objects in PHP without invoking their constructors",
"homepage": "https://github.com/doctrine/instantiator",
"keywords": [
"constructor",
"instantiate"
],
"time": "2015-06-14 21:17:01"
},
{
"name": "phpdocumentor/reflection-docblock",
"version": "2.0.5",
"source": {
"type": "git",
"url": "https://github.com/phpDocumentor/ReflectionDocBlock.git",
"reference": "e6a969a640b00d8daa3c66518b0405fb41ae0c4b"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/phpDocumentor/ReflectionDocBlock/zipball/e6a969a640b00d8daa3c66518b0405fb41ae0c4b",
"reference": "e6a969a640b00d8daa3c66518b0405fb41ae0c4b",
"shasum": ""
},
"require": {
"php": ">=5.3.3"
},
"require-dev": {
"phpunit/phpunit": "~4.0"
},
"suggest": {
"dflydev/markdown": "~1.0",
"erusev/parsedown": "~1.0"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "2.0.x-dev"
}
},
"autoload": {
"psr-0": {
"phpDocumentor": [
"src/"
]
}
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"MIT"
],
"authors": [
{
"name": "Mike van Riel",
"email": "mike.vanriel@naenius.com"
}
],
"time": "2016-01-25 08:17:30"
},
{
"name": "phpspec/prophecy",
"version": "1.8.1",
"source": {
"type": "git",
"url": "https://github.com/phpspec/prophecy.git",
"reference": "1927e75f4ed19131ec9bcc3b002e07fb1173ee76"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/phpspec/prophecy/zipball/1927e75f4ed19131ec9bcc3b002e07fb1173ee76",
"reference": "1927e75f4ed19131ec9bcc3b002e07fb1173ee76",
"shasum": ""
},
"require": {
"doctrine/instantiator": "^1.0.2",
"php": "^5.3|^7.0",
"phpdocumentor/reflection-docblock": "^2.0|^3.0.2|^4.0",
"sebastian/comparator": "^1.1|^2.0|^3.0",
"sebastian/recursion-context": "^1.0|^2.0|^3.0"
},
"require-dev": {
"phpspec/phpspec": "^2.5|^3.2",
"phpunit/phpunit": "^4.8.35 || ^5.7 || ^6.5 || ^7.1"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.8.x-dev"
}
},
"autoload": {
"psr-4": {
"Prophecy\\": "src/Prophecy"
}
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"MIT"
],
"authors": [
{
"name": "Konstantin Kudryashov",
"email": "ever.zet@gmail.com",
"homepage": "http://everzet.com"
},
{
"name": "Marcello Duarte",
"email": "marcello.duarte@gmail.com"
}
],
"description": "Highly opinionated mocking framework for PHP 5.3+",
"homepage": "https://github.com/phpspec/prophecy",
"keywords": [
"Double",
"Dummy",
"fake",
"mock",
"spy",
"stub"
],
"time": "2019-06-13 12:50:23"
},
{
"name": "phpunit/php-code-coverage",
"version": "2.2.4",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/php-code-coverage.git",
"reference": "eabf68b476ac7d0f73793aada060f1c1a9bf8979"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/php-code-coverage/zipball/eabf68b476ac7d0f73793aada060f1c1a9bf8979",
"reference": "eabf68b476ac7d0f73793aada060f1c1a9bf8979",
"shasum": ""
},
"require": {
"php": ">=5.3.3",
"phpunit/php-file-iterator": "~1.3",
"phpunit/php-text-template": "~1.2",
"phpunit/php-token-stream": "~1.3",
"sebastian/environment": "^1.3.2",
"sebastian/version": "~1.0"
},
"require-dev": {
"ext-xdebug": ">=2.1.4",
"phpunit/phpunit": "~4"
},
"suggest": {
"ext-dom": "*",
"ext-xdebug": ">=2.2.1",
"ext-xmlwriter": "*"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "2.2.x-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Sebastian Bergmann",
"email": "sb@sebastian-bergmann.de",
"role": "lead"
}
],
"description": "Library that provides collection, processing, and rendering functionality for PHP code coverage information.",
"homepage": "https://github.com/sebastianbergmann/php-code-coverage",
"keywords": [
"coverage",
"testing",
"xunit"
],
"time": "2015-10-06 15:47:00"
},
{
"name": "phpunit/php-file-iterator",
"version": "1.4.5",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/php-file-iterator.git",
"reference": "730b01bc3e867237eaac355e06a36b85dd93a8b4"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/php-file-iterator/zipball/730b01bc3e867237eaac355e06a36b85dd93a8b4",
"reference": "730b01bc3e867237eaac355e06a36b85dd93a8b4",
"shasum": ""
},
"require": {
"php": ">=5.3.3"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.4.x-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Sebastian Bergmann",
"email": "sb@sebastian-bergmann.de",
"role": "lead"
}
],
"description": "FilterIterator implementation that filters files based on a list of suffixes.",
"homepage": "https://github.com/sebastianbergmann/php-file-iterator/",
"keywords": [
"filesystem",
"iterator"
],
"time": "2017-11-27 13:52:08"
},
{
"name": "phpunit/php-text-template",
"version": "1.2.1",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/php-text-template.git",
"reference": "31f8b717e51d9a2afca6c9f046f5d69fc27c8686"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/php-text-template/zipball/31f8b717e51d9a2afca6c9f046f5d69fc27c8686",
"reference": "31f8b717e51d9a2afca6c9f046f5d69fc27c8686",
"shasum": ""
},
"require": {
"php": ">=5.3.3"
},
"type": "library",
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Sebastian Bergmann",
"email": "sebastian@phpunit.de",
"role": "lead"
}
],
"description": "Simple template engine.",
"homepage": "https://github.com/sebastianbergmann/php-text-template/",
"keywords": [
"template"
],
"time": "2015-06-21 13:50:34"
},
{
"name": "phpunit/php-timer",
"version": "1.0.9",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/php-timer.git",
"reference": "3dcf38ca72b158baf0bc245e9184d3fdffa9c46f"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/php-timer/zipball/3dcf38ca72b158baf0bc245e9184d3fdffa9c46f",
"reference": "3dcf38ca72b158baf0bc245e9184d3fdffa9c46f",
"shasum": ""
},
"require": {
"php": "^5.3.3 || ^7.0"
},
"require-dev": {
"phpunit/phpunit": "^4.8.35 || ^5.7 || ^6.0"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.0-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Sebastian Bergmann",
"email": "sb@sebastian-bergmann.de",
"role": "lead"
}
],
"description": "Utility class for timing",
"homepage": "https://github.com/sebastianbergmann/php-timer/",
"keywords": [
"timer"
],
"time": "2017-02-26 11:10:40"
},
{
"name": "phpunit/php-token-stream",
"version": "1.4.12",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/php-token-stream.git",
"reference": "1ce90ba27c42e4e44e6d8458241466380b51fa16"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/php-token-stream/zipball/1ce90ba27c42e4e44e6d8458241466380b51fa16",
"reference": "1ce90ba27c42e4e44e6d8458241466380b51fa16",
"shasum": ""
},
"require": {
"ext-tokenizer": "*",
"php": ">=5.3.3"
},
"require-dev": {
"phpunit/phpunit": "~4.2"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.4-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Sebastian Bergmann",
"email": "sebastian@phpunit.de"
}
],
"description": "Wrapper around PHP's tokenizer extension.",
"homepage": "https://github.com/sebastianbergmann/php-token-stream/",
"keywords": [
"tokenizer"
],
"time": "2017-12-04 08:55:13"
},
{
"name": "phpunit/phpunit",
"version": "4.8.36",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/phpunit.git",
"reference": "46023de9a91eec7dfb06cc56cb4e260017298517"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/phpunit/zipball/46023de9a91eec7dfb06cc56cb4e260017298517",
"reference": "46023de9a91eec7dfb06cc56cb4e260017298517",
"shasum": ""
},
"require": {
"ext-dom": "*",
"ext-json": "*",
"ext-pcre": "*",
"ext-reflection": "*",
"ext-spl": "*",
"php": ">=5.3.3",
"phpspec/prophecy": "^1.3.1",
"phpunit/php-code-coverage": "~2.1",
"phpunit/php-file-iterator": "~1.4",
"phpunit/php-text-template": "~1.2",
"phpunit/php-timer": "^1.0.6",
"phpunit/phpunit-mock-objects": "~2.3",
"sebastian/comparator": "~1.2.2",
"sebastian/diff": "~1.2",
"sebastian/environment": "~1.3",
"sebastian/exporter": "~1.2",
"sebastian/global-state": "~1.0",
"sebastian/version": "~1.0",
"symfony/yaml": "~2.1|~3.0"
},
"suggest": {
"phpunit/php-invoker": "~1.1"
},
"bin": [
"phpunit"
],
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "4.8.x-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Sebastian Bergmann",
"email": "sebastian@phpunit.de",
"role": "lead"
}
],
"description": "The PHP Unit Testing framework.",
"homepage": "https://phpunit.de/",
"keywords": [
"phpunit",
"testing",
"xunit"
],
"time": "2017-06-21 08:07:12"
},
{
"name": "phpunit/phpunit-mock-objects",
"version": "2.3.8",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/phpunit-mock-objects.git",
"reference": "ac8e7a3db35738d56ee9a76e78a4e03d97628983"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/phpunit-mock-objects/zipball/ac8e7a3db35738d56ee9a76e78a4e03d97628983",
"reference": "ac8e7a3db35738d56ee9a76e78a4e03d97628983",
"shasum": ""
},
"require": {
"doctrine/instantiator": "^1.0.2",
"php": ">=5.3.3",
"phpunit/php-text-template": "~1.2",
"sebastian/exporter": "~1.2"
},
"require-dev": {
"phpunit/phpunit": "~4.4"
},
"suggest": {
"ext-soap": "*"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "2.3.x-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Sebastian Bergmann",
"email": "sb@sebastian-bergmann.de",
"role": "lead"
}
],
"description": "Mock Object library for PHPUnit",
"homepage": "https://github.com/sebastianbergmann/phpunit-mock-objects/",
"keywords": [
"mock",
"xunit"
],
"abandoned": true,
"time": "2015-10-02 06:51:40"
},
{
"name": "sebastian/comparator",
"version": "1.2.4",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/comparator.git",
"reference": "2b7424b55f5047b47ac6e5ccb20b2aea4011d9be"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/comparator/zipball/2b7424b55f5047b47ac6e5ccb20b2aea4011d9be",
"reference": "2b7424b55f5047b47ac6e5ccb20b2aea4011d9be",
"shasum": ""
},
"require": {
"php": ">=5.3.3",
"sebastian/diff": "~1.2",
"sebastian/exporter": "~1.2 || ~2.0"
},
"require-dev": {
"phpunit/phpunit": "~4.4"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.2.x-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Jeff Welch",
"email": "whatthejeff@gmail.com"
},
{
"name": "Volker Dusch",
"email": "github@wallbash.com"
},
{
"name": "Bernhard Schussek",
"email": "bschussek@2bepublished.at"
},
{
"name": "Sebastian Bergmann",
"email": "sebastian@phpunit.de"
}
],
"description": "Provides the functionality to compare PHP values for equality",
"homepage": "http://www.github.com/sebastianbergmann/comparator",
"keywords": [
"comparator",
"compare",
"equality"
],
"time": "2017-01-29 09:50:25"
},
{
"name": "sebastian/diff",
"version": "1.4.3",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/diff.git",
"reference": "7f066a26a962dbe58ddea9f72a4e82874a3975a4"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/diff/zipball/7f066a26a962dbe58ddea9f72a4e82874a3975a4",
"reference": "7f066a26a962dbe58ddea9f72a4e82874a3975a4",
"shasum": ""
},
"require": {
"php": "^5.3.3 || ^7.0"
},
"require-dev": {
"phpunit/phpunit": "^4.8.35 || ^5.7 || ^6.0"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.4-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Kore Nordmann",
"email": "mail@kore-nordmann.de"
},
{
"name": "Sebastian Bergmann",
"email": "sebastian@phpunit.de"
}
],
"description": "Diff implementation",
"homepage": "https://github.com/sebastianbergmann/diff",
"keywords": [
"diff"
],
"time": "2017-05-22 07:24:03"
},
{
"name": "sebastian/environment",
"version": "1.3.8",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/environment.git",
"reference": "be2c607e43ce4c89ecd60e75c6a85c126e754aea"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/environment/zipball/be2c607e43ce4c89ecd60e75c6a85c126e754aea",
"reference": "be2c607e43ce4c89ecd60e75c6a85c126e754aea",
"shasum": ""
},
"require": {
"php": "^5.3.3 || ^7.0"
},
"require-dev": {
"phpunit/phpunit": "^4.8 || ^5.0"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.3.x-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Sebastian Bergmann",
"email": "sebastian@phpunit.de"
}
],
"description": "Provides functionality to handle HHVM/PHP environments",
"homepage": "http://www.github.com/sebastianbergmann/environment",
"keywords": [
"Xdebug",
"environment",
"hhvm"
],
"time": "2016-08-18 05:49:44"
},
{
"name": "sebastian/exporter",
"version": "1.2.2",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/exporter.git",
"reference": "42c4c2eec485ee3e159ec9884f95b431287edde4"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/exporter/zipball/42c4c2eec485ee3e159ec9884f95b431287edde4",
"reference": "42c4c2eec485ee3e159ec9884f95b431287edde4",
"shasum": ""
},
"require": {
"php": ">=5.3.3",
"sebastian/recursion-context": "~1.0"
},
"require-dev": {
"ext-mbstring": "*",
"phpunit/phpunit": "~4.4"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.3.x-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Jeff Welch",
"email": "whatthejeff@gmail.com"
},
{
"name": "Volker Dusch",
"email": "github@wallbash.com"
},
{
"name": "Bernhard Schussek",
"email": "bschussek@2bepublished.at"
},
{
"name": "Sebastian Bergmann",
"email": "sebastian@phpunit.de"
},
{
"name": "Adam Harvey",
"email": "aharvey@php.net"
}
],
"description": "Provides the functionality to export PHP variables for visualization",
"homepage": "http://www.github.com/sebastianbergmann/exporter",
"keywords": [
"export",
"exporter"
],
"time": "2016-06-17 09:04:28"
},
{
"name": "sebastian/global-state",
"version": "1.1.1",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/global-state.git",
"reference": "bc37d50fea7d017d3d340f230811c9f1d7280af4"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/global-state/zipball/bc37d50fea7d017d3d340f230811c9f1d7280af4",
"reference": "bc37d50fea7d017d3d340f230811c9f1d7280af4",
"shasum": ""
},
"require": {
"php": ">=5.3.3"
},
"require-dev": {
"phpunit/phpunit": "~4.2"
},
"suggest": {
"ext-uopz": "*"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.0-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Sebastian Bergmann",
"email": "sebastian@phpunit.de"
}
],
"description": "Snapshotting of global state",
"homepage": "http://www.github.com/sebastianbergmann/global-state",
"keywords": [
"global state"
],
"time": "2015-10-12 03:26:01"
},
{
"name": "sebastian/recursion-context",
"version": "1.0.5",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/recursion-context.git",
"reference": "b19cc3298482a335a95f3016d2f8a6950f0fbcd7"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/recursion-context/zipball/b19cc3298482a335a95f3016d2f8a6950f0fbcd7",
"reference": "b19cc3298482a335a95f3016d2f8a6950f0fbcd7",
"shasum": ""
},
"require": {
"php": ">=5.3.3"
},
"require-dev": {
"phpunit/phpunit": "~4.4"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.0.x-dev"
}
},
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Jeff Welch",
"email": "whatthejeff@gmail.com"
},
{
"name": "Sebastian Bergmann",
"email": "sebastian@phpunit.de"
},
{
"name": "Adam Harvey",
"email": "aharvey@php.net"
}
],
"description": "Provides functionality to recursively process PHP variables",
"homepage": "http://www.github.com/sebastianbergmann/recursion-context",
"time": "2016-10-03 07:41:43"
},
{
"name": "sebastian/version",
"version": "1.0.6",
"source": {
"type": "git",
"url": "https://github.com/sebastianbergmann/version.git",
"reference": "58b3a85e7999757d6ad81c787a1fbf5ff6c628c6"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/sebastianbergmann/version/zipball/58b3a85e7999757d6ad81c787a1fbf5ff6c628c6",
"reference": "58b3a85e7999757d6ad81c787a1fbf5ff6c628c6",
"shasum": ""
},
"type": "library",
"autoload": {
"classmap": [
"src/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"BSD-3-Clause"
],
"authors": [
{
"name": "Sebastian Bergmann",
"email": "sebastian@phpunit.de",
"role": "lead"
}
],
"description": "Library that helps with managing the version number of Git-hosted PHP projects",
"homepage": "https://github.com/sebastianbergmann/version",
"time": "2015-06-21 13:59:46"
},
{
"name": "symfony/polyfill-ctype",
"version": "v1.12.0",
"source": {
"type": "git",
"url": "https://github.com/symfony/polyfill-ctype.git",
"reference": "550ebaac289296ce228a706d0867afc34687e3f4"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/symfony/polyfill-ctype/zipball/550ebaac289296ce228a706d0867afc34687e3f4",
"reference": "550ebaac289296ce228a706d0867afc34687e3f4",
"shasum": ""
},
"require": {
"php": ">=5.3.3"
},
"suggest": {
"ext-ctype": "For best performance"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "1.12-dev"
}
},
"autoload": {
"psr-4": {
"Symfony\\Polyfill\\Ctype\\": ""
},
"files": [
"bootstrap.php"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"MIT"
],
"authors": [
{
"name": "Gert de Pagter",
"email": "BackEndTea@gmail.com"
},
{
"name": "Symfony Community",
"homepage": "https://symfony.com/contributors"
}
],
"description": "Symfony polyfill for ctype functions",
"homepage": "https://symfony.com",
"keywords": [
"compatibility",
"ctype",
"polyfill",
"portable"
],
"time": "2019-08-06 08:03:45"
},
{
"name": "symfony/yaml",
"version": "v2.8.50",
"source": {
"type": "git",
"url": "https://github.com/symfony/yaml.git",
"reference": "02c1859112aa779d9ab394ae4f3381911d84052b"
},
"dist": {
"type": "zip",
"url": "https://api.github.com/repos/symfony/yaml/zipball/02c1859112aa779d9ab394ae4f3381911d84052b",
"reference": "02c1859112aa779d9ab394ae4f3381911d84052b",
"shasum": ""
},
"require": {
"php": ">=5.3.9",
"symfony/polyfill-ctype": "~1.8"
},
"type": "library",
"extra": {
"branch-alias": {
"dev-master": "2.8-dev"
}
},
"autoload": {
"psr-4": {
"Symfony\\Component\\Yaml\\": ""
},
"exclude-from-classmap": [
"/Tests/"
]
},
"notification-url": "https://packagist.org/downloads/",
"license": [
"MIT"
],
"authors": [
{
"name": "Fabien Potencier",
"email": "fabien@symfony.com"
},
{
"name": "Symfony Community",
"homepage": "https://symfony.com/contributors"
}
],
"description": "Symfony Yaml Component",
"homepage": "https://symfony.com",
"time": "2018-11-11 11:18:13"
}
],
"aliases": [],
"minimum-stability": "stable",
"stability-flags": [],
"prefer-stable": false,
"prefer-lowest": false,
"platform": {
"php": ">=5.4",
"ext-dom": "*",
"ext-iconv": "*"
},
"platform-dev": [],
"platform-overrides": {
"php": "5.4"
}
}
README-RU.md 0000644 00000063164 15050224513 0006356 0 ustar 00 # DiDOM
[](https://travis-ci.org/Imangazaliev/DiDOM)
[](https://packagist.org/packages/imangazaliev/didom)
[](https://packagist.org/packages/imangazaliev/didom)
[](https://packagist.org/packages/imangazaliev/didom)
[English version](README.md)
DiDOM - простая и быстрая библиотека для парсинга HTML.
## Содержание
- [Установка](#Установка)
- [Быстрый старт](#Быстрый-старт)
- [Создание нового документа](#Создание-нового-документа)
- [Поиск элементов](#Поиск-элементов)
- [Проверка наличия элемента](#Проверка-наличия-элемента)
- [Подсчет количества элементов](#Подсчет-количества-элементов)
- [Поиск в элементе](#Поиск-в-элементе)
- [Поддерживамые селекторы](#Поддерживамые-селекторы)
- [Изменение содержимого](#Изменение-содержимого)
- [Вывод содержимого](#Вывод-содержимого)
- [Работа с элементами](#Работа-с-элементами)
- [Создание нового элемента](#Создание-нового-элемента)
- [Получение названия элемента](#Получение-названия-элемента)
- [Получение родительского элемента](#Получение-родительского-элемента)
- [Получение соседних элементов](#Получение-соседних-элементов)
- [Получение дочерних элементов](#Получение-соседних-элементов)
- [Получение документа](#Получение-документа)
- [Работа с атрибутами элемента](#Работа-с-атрибутами-элемента)
- [Сравнение элементов](#Сравнение-элементов)
- [Добавление дочерних элементов](#Добавление-дочерних-элементов)
- [Замена элемента](#Замена-элемента)
- [Удаление элемента](#Удаление-элемента)
- [Работа с кэшем](#Работа-с-кэшем)
- [Прочее](#Прочее)
- [Сравнение с другими парсерами](#Сравнение-с-другими-парсерами)
## Установка
Для установки DiDOM выполните команду:
composer require imangazaliev/didom
## Быстрый старт
```php
use DiDom\Document;
$document = new Document('http://www.news.com/', true);
$posts = $document->find('.post');
foreach($posts as $post) {
echo $post->text(), "\n";
}
```
## Создание нового документа
DiDom позволяет загрузить HTML несколькими способами:
##### Через конструктор
```php
// в первом параметре передается строка с HTML
$document = new Document($html);
// путь к файлу
$document = new Document('page.html', true);
// или URL
$document = new Document('http://www.example.com/', true);
// также можно создать документ из DOMDocument
$domDocument = new DOMDocument();
$document = new Document($domDocument);
```
Сигнатура:
```php
__construct($string = null, $isFile = false, $encoding = 'UTF-8', $type = Document::TYPE_HTML)
```
`$isFile` - указывает, что загружается файл. По умолчанию - `false`.
`$encoding` - кодировка документа. По умолчанию - UTF-8.
`$type` - тип документа (HTML - `Document::TYPE_HTML`, XML - `Document::TYPE_XML`). По умолчанию - `Document::TYPE_HTML`.
##### Через отдельные методы
```php
$document = new Document();
$document->loadHtml($html);
$document->loadHtmlFile('page.html');
$document->loadHtmlFile('http://www.example.com/');
```
Для загрузки XML есть соответствующие методы `loadXml` и `loadXmlFile`.
При загрузке документа через эти методы, парсеру можно передать дополнительные [опции](http://php.net/manual/ru/libxml.constants.php):
```php
$document->loadHtml($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$document->loadHtmlFile($url, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$document->loadXml($xml, LIBXML_PARSEHUGE);
$document->loadXmlFile($url, LIBXML_PARSEHUGE);
```
## Поиск элементов
В качестве выражения для поиска можно передать CSS-селектор или XPath. Для этого в первом параметре нужно передать само выражение, а во втором - его тип (по умолчанию - `Query::TYPE_CSS`):
##### Через метод `find()`:
```php
use DiDom\Document;
use DiDom\Query;
...
// CSS-селектор
$posts = $document->find('.post');
// эквивалентно
$posts = $document->find('.post', Query::TYPE_CSS);
// XPath-выражение
$posts = $document->find("//div[contains(@class, 'post')]", Query::TYPE_XPATH);
```
Метод вернет массив с элементами (экземпляры класса `DiDom\Element`) или пустой массив, если не найден ни один элемент, соответствующий выражению.
При желании можно получить массив узлов без преобразования в Element или текст (`DOMElement`/`DOMText`/`DOMComment`/`DOMAttr`, в зависимости от выражения), для этого необходимо передать в качестве третьего параметра `false`.
##### Через метод `first()`:
Возвращает первый найденный элемент или `null`, если не найдено ни одного элемента.
Принимает те же параметры, что и метод `find()`.
##### Через магический метод `__invoke()`:
```php
$posts = $document('.post');
```
Принимает те же параметры, что и метод `find()`.
**Внимание:** использование данного метода нежелательно, т.к. в будущем он может быть удален.
##### Через метод `xpath()`:
```php
$posts = $document->xpath("//*[contains(concat(' ', normalize-space(@class), ' '), ' post ')]");
```
## Проверка наличия элемента
Проверить наличие элемента можно с помощью метода `has()`:
```php
if ($document->has('.post')) {
// код
}
```
Если нужно проверить наличие элемента, а затем получить его, то можно сделать так:
```php
if ($document->has('.post')) {
$elements = $document->find('.post');
// код
}
```
но быстрее так:
```php
$elements = $document->find('.post');
if (count($elements) > 0) {
// код
}
```
т.к. в первом случае выполняется два запроса.
## Подсчет количества элементов
Метод `count()` позволяет подсчитать количество дочерних элементов, соотвествующих селектору:
```php
// выведет количество ссылок в документе
echo $document->count('a');
```
```php
// выведет количество пунктов в списке
echo $document->first('ul')->count('> li');
```
## Поиск в элементе
Методы `find()`, `first()`, `xpath()`, `has()`, `count()` доступны также и для элемента.
Пример:
```php
echo $document->find('nav')[0]->first('ul.menu')->xpath('//li')[0]->text();
```
#### Метод `findInDocument()`
При изменении, замене или удалении элемента, найденного в другом элементе, документ не будет изменен. Данное поведение связано с тем, что в методе `find()` класса `Element` (а, соответственно, и в методах `first()` и `xpath`) создается новый документ, в котором и производится поиск.
Для поиска элементов в исходном документе необходимо использовать методы `findInDocument()` и `firstInDocument()`:
```php
// ничего не выйдет
$document->first('head')->first('title')->remove();
// а вот так да
$document->first('head')->firstInDocument('title')->remove();
```
**Внимание:** методы `findInDocument()` и `firstInDocument()` работают только для элементов, которые принадлежат какому-либо документу, либо созданых через `new Element(...)`. Если элемент не принадлежит к какому-либо документу, будет выброшено исключение `LogicException`;
## Поддерживамые селекторы
DiDom поддерживает поиск по:
- тэгу
- классу, идентификатору, имени и значению атрибута
- псевдоклассам:
- first-, last-, nth-child
- empty и not-empty
- contains
- has
```php
// все ссылки
$document->find('a');
// любой элемент с id = "foo" и классом "bar"
$document->find('#foo.bar');
// любой элемент, у которого есть атрибут "name"
$document->find('[name]');
// эквивалентно
$document->find('*[name]');
// поле ввода с именем "foo"
$document->find('input[name=foo]');
$document->find('input[name=\'foo\']');
$document->find('input[name="foo"]');
// поле ввода с именем "foo" и значением "bar"
$document->find('input[name="foo"][value="bar"]');
// поле ввода, название которого НЕ равно "foo"
$document->find('input[name!="foo"]');
// любой элемент, у которого есть атрибут,
// начинающийся с "data-" и равный "foo"
$document->find('*[^data-=foo]');
// все ссылки, у которых адрес начинается с https
$document->find('a[href^=https]');
// все изображения с расширением png
$document->find('img[src$=png]');
// все ссылки, содержащие в своем адресе строку "example.com"
$document->find('a[href*=example.com]');
// все ссылки, содержащие в атрибуте data-foo значение bar отделенное пробелом
$document->find('a[data-foo~=bar]');
// текст всех ссылок с классом "foo" (массив строк)
$document->find('a.foo::text');
// эквивалентно
$document->find('a.foo::text()');
// адрес и текст подсказки всех полей с классом "bar"
$document->find('a.bar::attr(href|title)');
// все ссылки, которые являются прямыми потомками текущего элемента
$element->find('> a');
```
## Изменение содержимого
### Изменение HTML
```php
$element->setInnerHtml('Foo');
```
### Изменение значения
```php
$element->setValue('Foo');
```
## Вывод содержимого
### Получение HTML
##### Через метод `html()`:
```php
// HTML-код документа
echo $document->html();
// HTML-код элемента
echo $document->first('.post')->html();
```
##### Приведение к строке:
```php
// HTML-код документа
$html = (string) $document;
// HTML-код элемента
$html = (string) $document->first('.post');
```
**Внимание:** использование данного способа нежелательно, т.к. в будущем он может быть удален.
##### Форматирование HTML при выводе
```php
echo $document->format()->html();
```
Метод `format()` отсутствует у элемента, поэтому, если нужно получить отформатированный HTML-код элемента, необходимо сначала преобразовать его в документ:
```php
$html = $element->toDocument()->format()->html();
```
#### Внутренний HTML
```php
$innerHtml = $element->innerHtml();
```
Метод `innerHtml()` отсутствует у документа, поэтому, если нужно получить внутренний HTML-код документа, необходимо сначала преобразовать его в элемент:
```php
$innerHtml = $document->toElement()->innerHtml();
```
### Получение XML
```php
// XML-код документа
echo $document->xml();
// XML-код элемента
echo $document->first('book')->xml();
```
### Получение содержимого
Возвращает текстовое содержимое узла и его потомков:
```php
echo $element->text();
```
## Создание нового элемента
### Создание экземпляра класса
```php
use DiDom\Element;
$element = new Element('span', 'Hello');
// выведет "Hello"
echo $element->html();
```
Первым параметром передается название элемента, вторым - его значение (необязательно), третьим - атрибуты элемента (необязательно).
Пример создания элемента с атрибутами:
```php
$attributes = ['name' => 'description', 'placeholder' => 'Enter description of item'];
$element = new Element('textarea', 'Text', $attributes);
```
Элемент можно создать и из экземпляра класса `DOMElement`:
```php
use DiDom\Element;
use DOMElement;
$domElement = new DOMElement('span', 'Hello');
$element = new Element($domElement);
```
#### Изменение элемента, созданного из `DOMElement`
Экземпляры класса `DOMElement`, созданные через конструктор (`new DOMElement(...)`), являются неизменяемыми, поэтому и элементы (экземпляры класса `DiDom\Element`), созданные из таких объектов, так же являются неизменяемыми.
Пример:
```php
$element = new Element('span', 'Hello');
// добавит атрибут "id" со значением "greeting"
$element->attr('id', 'greeting');
$domElement = new DOMElement('span', 'Hello');
$element = new Element($domElement);
// будет выброшено исключение
// DOMException with message 'No Modification Allowed Error'
$element->attr('id', 'greeting');
```
### С помощью метода `Document::createElement()`
```php
$document = new Document($html);
$element = $document->createElement('span', 'Hello');
```
### С помощью CSS-селектора
Первый параметр - селектор, второй - значение, третий - массив с атрибутами.
Атрибуты элемента могут быть указаны как в селекторе, так и переданы отдельно в третьем параметре.
Если название атрибута в массиве совпадает с названием атрибута из селектора, будет использовано значение, указанное в селекторе.
```php
$document = new Document($html);
$element = $document->createElementBySelector('div.block', 'Foo', [
'id' => '#content',
'class' => '.container',
]);
```
Можно так же использовать статический метод `createBySelector` класса `Element`:
```php
$element = Element::createBySelector('div.block', 'Foo', [
'id' => '#content',
'class' => '.container',
]);
```
## Получение названия элемента
```php
$element->tag;
```
## Получение родительского элемента
```php
$element->parent();
```
Так же можно получить родительский элемент, соответствующий селектору:
```php
$element->closest('.foo');
```
Вернет родительский элемент, у которого есть класс `foo`. Если подходящий элемент не найден, метод вернет `null`.
## Получение соседних элементов
Первый аргумент - CSS-селектор, второй - тип узла (`DOMElement`, `DOMText` или `DOMComment`).
Если оба аргумента опущены, будет осуществлен поиск узлов любого типа.
Если селектор указан, а тип узла нет, будет использован тип `DOMElement`.
**Внимание:** Селектор можно использовать только с типом `DOMElement`.
```php
// предыдущий элемент
$item->previousSibling();
// предыдущий элемент, соответствующий селектору
$item->previousSibling('span');
// предыдущий элемент типа DOMElement
$item->previousSibling(null, 'DOMElement');
// предыдущий элемент типа DOMComment
$item->previousSibling(null, 'DOMComment');
```
```php
// все предыдущие элементы
$item->previousSiblings();
// все предыдущие элементы, соответствующие селектору
$item->previousSiblings('span');
// все предыдущие элементы типа DOMElement
$item->previousSiblings(null, 'DOMElement');
// все предыдущие элементы типа DOMComment
$item->previousSiblings(null, 'DOMComment');
```
```php
// следующий элемент
$item->nextSibling();
// следующий элемент, соответствующий селектору
$item->nextSibling('span');
// следующий элемент типа DOMElement
$item->nextSibling(null, 'DOMElement');
// следующий элемент типа DOMComment
$item->nextSibling(null, 'DOMComment');
```
```php
// все последующие элементы
$item->nextSiblings();
// все последующие элементы, соответствующие селектору
$item->nextSiblings('span');
// все последующие элементы типа DOMElement
$item->nextSiblings(null, 'DOMElement');
// все последующие элементы типа DOMComment
$item->nextSiblings(null, 'DOMComment');
```
## Получение дочерних элементов
```php
$html = 'FooBar
';
$document = new Document($html);
$div = $document->first('div');
// элемент (DOMElement)
// string(3) "Bar"
var_dump($div->child(1)->text());
// текстовый узел (DOMText)
// string(3) "Foo"
var_dump($div->firstChild()->text());
// комментарий (DOMComment)
// string(3) "Baz"
var_dump($div->lastChild()->text());
// array(3) { ... }
var_dump($div->children());
```
## Получение документа
```php
$document = new Document($html);
$element = $document->first('input[name=email]');
$document2 = $element->getDocument();
// bool(true)
var_dump($document->is($document2));
```
## Работа с атрибутами элемента
#### Создание/изменение атрибута
##### Через метод `setAttribute`:
```php
$element->setAttribute('name', 'username');
```
##### Через метод `attr`:
```php
$element->attr('name', 'username');
```
##### Через магический метод `__set`:
```php
$element->name = 'username';
```
#### Получение значения атрибута
##### Через метод `getAttribute`:
```php
$username = $element->getAttribute('value');
```
##### Через метод `attr`:
```php
$username = $element->attr('value');
```
##### Через магический метод `__get`:
```php
$username = $element->name;
```
Если атрибут не найден, вернет `null`.
#### Проверка наличия атрибута
##### Через метод `hasAttribute`:
```php
if ($element->hasAttribute('name')) {
// код
}
```
##### Через магический метод `__isset`:
```php
if (isset($element->name)) {
// код
}
```
#### Удаление атрибута:
##### Через метод `removeAttribute`:
```php
$element->removeAttribute('name');
```
##### Через магический метод `__unset`:
```php
unset($element->name);
```
#### Получение всех атрибутов:
```php
var_dump($element->attributes());
```
#### Получение определенных атрибутов:
```php
var_dump($element->attributes(['name', 'type']));
```
#### Удаление всех атрибутов:
```php
$element->removeAllAttributes();
```
#### Удаление всех атрибутов, за исключением указанных:
```php
$element->removeAllAttributes(['name', 'type']);
```
## Сравнение элементов
```php
$element = new Element('span', 'hello');
$element2 = new Element('span', 'hello');
// bool(true)
var_dump($element->is($element));
// bool(false)
var_dump($element->is($element2));
```
## Добавление дочерних элементов
```php
$list = new Element('ul');
$item = new Element('li', 'Item 1');
$list->appendChild($item);
$items = [
new Element('li', 'Item 2'),
new Element('li', 'Item 3'),
];
$list->appendChild($items);
```
## Замена элемента
```php
$title = new Element('title', 'foo');
$document->first('title')->replace($title);
```
**Внимание:** заменить можно только те элементы, которые были найдены непосредственно в документе:
```php
// ничего не выйдет
$document->first('head')->first('title')->replace($title);
// а вот так да
$document->first('head title')->replace($title);
```
Подробнее об этом в разделе [Поиск в элементе](#Поиск-в-элементе).
## Удаление элемента
```php
$document->first('title')->remove();
```
**Внимание:** удалить можно только те элементы, которые были найдены непосредственно в документе:
```php
// ничего не выйдет
$document->first('head')->first('title')->remove();
// а вот так да
$document->first('head title')->remove();
```
Подробнее об этом в разделе [Поиск в элементе](#Поиск-в-элементе).
## Работа с кэшем
Кэш - массив XPath-выражений, полученных из CSS.
#### Получение кэша
```php
use DiDom\Query;
...
$xpath = Query::compile('h2');
$compiled = Query::getCompiled();
// array('h2' => '//h2')
var_dump($compiled);
```
#### Установка кэша
```php
Query::setCompiled(['h2' => '//h2']);
```
## Прочее
#### `preserveWhiteSpace`
По умолчанию сохранение пробелов между тегами отключено.
Включать опцию `preserveWhiteSpace` следует до загрузки документа:
```php
$document = new Document();
$document->preserveWhiteSpace();
$document->loadXml($xml);
```
#### `matches`
Возвращает `true`, если элемент соответсвует селектору:
```php
// вернет true, если элемент это div с идентификатором content
$element->matches('div#content');
// строгое соответствие
// вернет true, если элемент это div с идентификатором content и ничего более
// если у элемента будут какие-либо другие атрибуты, метод вернет false
$element->matches('div#content', true);
```
#### `isElementNode`
Проверяет, является ли элемент узлом типа DOMElement:
```php
$element->isElementNode();
```
#### `isTextNode`
Проверяет, является ли элемент текстовым узлом (DOMText):
```php
$element->isTextNode();
```
#### `isCommentNode`
Проверяет, является ли элемент комментарием (DOMComment):
```php
$element->isCommentNode();
```
## Сравнение с другими парсерами
[Сравнение с другими парсерами](https://github.com/Imangazaliev/DiDOM/wiki/Сравнение-с-другими-парсерами-(1.6.3))
CHANGELOG.md 0000644 00000010455 15050224513 0006357 0 ustar 00 ### 1.13
- Add `Element::outerHtml()` method
- Add `Element::prependChild()` method
- Add `Element::insertBefore()` and `Element::insertAfter()` methods
- Add `Element::style()` method for more convenient inline styles manipulation
- Add `Element::classes()` method for more convenient class manipulation
### 1.12
- Many fixes and improvements
### 1.11.1
- Fix bug with unregistered PHP functions in XPath in `Document::has()` and `Document::count()` methods
### 1.11
- Add `Element::isElementNode()` method
- Add ability to retrieve only specific attributes in `Element::attributes()` method
- Add `Element::removeAllAttributes()` method
- Add ability to specify selector and node type in `Element::previousSibling()` and `Element::nextSibling()` methods
- Add `Element::previousSiblings()` and `Element::nextSiblings()` methods
- Many minor fixes and improvements
### 1.10.6
- Fix bug with XML document loading
### v1.10.5
- Fix issue #85
### 1.10.4
- Use `mb_convert_encoding()` in the Encoder if it is available
### v1.10.3
- Add `Element::removeChild()` and `Element::removeChildren()` methods
- Fix bug in `Element::matches()` method
- `Element::matches()` method now returns false if node is not `DOMElement`
- Add `Element::hasChildren()` method
### 1.10.2
- Fix bug in setInnerHtml: can't rewrite existing content
- Throw `InvalidSelectorException` instead of `InvalidArgumentException` when selector is empty
### 1.10.1
- Fix attributes `ends-with` XPath
- Method `Element::matches()` now can check children nodes
### 1.10
- Fix HTML saving mechanism
- Throw `InvalidSelectorException` instead of `RuntimeException` in Query class
### 1.9.1
- Add ability to search in owner document using current node as context
- Bugs fixed
### 1.9.0
- Methods `Document::appendChild()` and `Element::appendChild()` now return appended node(s)
- Add ability to search elements in context
### 1.8.8
- Bugs fixed
### 1.8.7
- Add `Element::getLineNo()` method
### 1.8.6
- Fix issue #55
### 1.8.5
- Add support of `DOMComment`
### 1.8.4
- Add ability to create an element by selector
- Add closest method
### 1.8.3
- Add method `Element::isTextNode()`
- Many minor fixes
### 1.8.2
- Add ability to check that element matches selector
- Add ability counting nodes by selector
- Many minor fixes
### 1.8.1
- Small fix
### 1.8
- Bug fixes
- Add support of ~ selector
- Add ability to direct search by CSS selector
- Add setInnerHtml method
- Add attributes method
### 1.7.4
- Add support of text nodes
### 1.7.3
- Bug fix
### 1.7.2
- Fixed behavior of nth-child pseudo class
- Add nth-of-type pseudo class
### 1.7.1
- Add pseudo class has and more attribute options
### 1.7.0
- Bug fixes
- Add methods `previousSibling`, `nextSibling`, `child`, `firstChild`, `lastChild`, `children`, `getDocument` to the Element
- Changed behavior of parent method. Now it returns parent node instead of owner document
### 1.6.8
- Bug fix
### 1.6.5
- Added ability to get an element attribute by CSS selector
### 1.6.4
- Added handling of `DOMText` and `DOMAttr` in `Document::find()`
### 1.6.3
- Added ability to get inner HTML
### 1.6.2
- Added the ability to pass options when load HTML or XML
### 1.6.1
- Added the ability to pass an array of nodes to appendChild
- Added the ability to pass options when converting to HTML or XML
- Added the ability to add child elements to the element
### 1.6
- Added support for XML
- Added the ability to search element by part of attribute name or value
- Added support for pseudo-class "contains"
- Added the ability to clone a node
### 1.5.1
- Added ability to remove and replace nodes
- Added ability to specify encoding when converting the element into the document
### 1.5
- Fixed problem with incorrect encoding
- Added ability to set the value of the element
- Added ability to specify encoding when creating document
### 1.4
- Added the ability to specify the return type element (`DiDom\Element` or `DOMElement`)
### 1.3.2
- Bug fixed
### 1.3.1
- Bugs fixed
- Added the ability to pass element attributes in the constructor
### 1.3
- Bugs fixed
### 1.2
- Bugs fixed
- Added the ability to compare Element\Document
- Added the ability to format HTML code of the document when outputting
### 1.1
- Added cache control
- Converter from CSS to XPath replaced by faster
### 1.0
- First release src/DiDom/Document.php 0000644 00000052222 15050224513 0010616 0 ustar 00 'http://php.net/xpath'
];
/**
* @param string|null $string An HTML or XML string or a file path
* @param bool $isFile Indicates that the first parameter is a path to a file
* @param string $encoding The document encoding
* @param string $type The document type
*
* @throws InvalidArgumentException if parameter 3 is not a string
*/
public function __construct($string = null, $isFile = false, $encoding = 'UTF-8', $type = Document::TYPE_HTML)
{
if ($string instanceof DOMDocument) {
$this->document = $string;
return;
}
if ( ! is_string($encoding)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 3 to be string, %s given', __METHOD__, gettype($encoding)));
}
$this->encoding = $encoding;
$this->document = new DOMDocument('1.0', $encoding);
$this->preserveWhiteSpace(false);
if ($string !== null) {
$this->load($string, $isFile, $type);
}
}
/**
* Creates a new document.
*
* @param string|null $string An HTML or XML string or a file path
* @param bool $isFile Indicates that the first parameter is a path to a file
* @param string $encoding The document encoding
* @param string $type The document type
*
* @return Document
*/
public static function create($string = null, $isFile = false, $encoding = 'UTF-8', $type = Document::TYPE_HTML)
{
return new Document($string, $isFile, $encoding, $type);
}
/**
* Creates a new element node.
*
* @param string $name The tag name of the element
* @param string|null $value The value of the element
* @param array $attributes The attributes of the element
*
* @return Element created element
*/
public function createElement($name, $value = null, array $attributes = [])
{
$node = $this->document->createElement($name);
return new Element($node, $value, $attributes);
}
/**
* Creates a new element node by CSS selector.
*
* @param string $selector
* @param string|null $value
* @param array $attributes
*
* @return Element
*
* @throws InvalidSelectorException
*/
public function createElementBySelector($selector, $value = null, array $attributes = [])
{
$segments = Query::getSegments($selector);
$name = array_key_exists('tag', $segments) ? $segments['tag'] : 'div';
if (array_key_exists('attributes', $segments)) {
$attributes = array_merge($attributes, $segments['attributes']);
}
if (array_key_exists('id', $segments)) {
$attributes['id'] = $segments['id'];
}
if (array_key_exists('classes', $segments)) {
$attributes['class'] = implode(' ', $segments['classes']);
}
return $this->createElement($name, $value, $attributes);
}
/**
* @param string $content
*
* @return Element
*/
public function createTextNode($content)
{
return new Element(new DOMText($content));
}
/**
* @param string $data
*
* @return Element
*/
public function createComment($data)
{
return new Element(new DOMComment($data));
}
/**
* @param string $data
*
* @return Element
*/
public function createCdataSection($data)
{
return new Element(new DOMCdataSection($data));
}
/**
* @return DocumentFragment
*/
public function createDocumentFragment()
{
return new DocumentFragment($this->document->createDocumentFragment());
}
/**
* Adds a new child at the end of the children.
*
* @param Element|DOMNode|array $nodes The appended child
*
* @return Element|Element[]
*
* @throws InvalidArgumentException if one of elements of parameter 1 is not an instance of DOMNode or Element
*/
public function appendChild($nodes)
{
$returnArray = true;
if ( ! is_array($nodes)) {
$nodes = [$nodes];
$returnArray = false;
}
$result = [];
foreach ($nodes as $node) {
if ($node instanceof Element) {
$node = $node->getNode();
}
if ( ! $node instanceof DOMNode) {
throw new InvalidArgumentException(sprintf('Argument 1 passed to %s must be an instance of %s\Element or DOMNode, %s given', __METHOD__, __NAMESPACE__, (is_object($node) ? get_class($node) : gettype($node))));
}
Errors::disable();
$cloned = $node->cloneNode(true);
$newNode = $this->document->importNode($cloned, true);
$result[] = $this->document->appendChild($newNode);
Errors::restore();
}
$result = array_map(function (DOMNode $node) {
return new Element($node);
}, $result);
return $returnArray ? $result : $result[0];
}
/**
* Set preserveWhiteSpace property.
*
* @param bool $value
*
* @return Document
*/
public function preserveWhiteSpace($value = true)
{
if ( ! is_bool($value)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be boolean, %s given', __METHOD__, gettype($value)));
}
$this->document->preserveWhiteSpace = $value;
return $this;
}
/**
* Load HTML or XML.
*
* @param string $string An HTML or XML string or a file path
* @param bool $isFile Indicates that the first parameter is a file path
* @param string $type The type of a document
* @param int|null $options libxml option constants
*
* @return Document
*
* @throws InvalidArgumentException if parameter 1 is not a string
* @throws InvalidArgumentException if parameter 3 is not a string
* @throws InvalidArgumentException if parameter 4 is not an integer or null
* @throws RuntimeException if the document type is invalid (not Document::TYPE_HTML or Document::TYPE_XML)
*/
public function load($string, $isFile = false, $type = Document::TYPE_HTML, $options = null)
{
if ( ! is_string($string)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, (is_object($string) ? get_class($string) : gettype($string))));
}
if ( ! is_string($type)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 3 to be string, %s given', __METHOD__, (is_object($type) ? get_class($type) : gettype($type))));
}
if ( ! in_array(strtolower($type), [Document::TYPE_HTML, Document::TYPE_XML], true)) {
throw new RuntimeException(sprintf('Document type must be "xml" or "html", %s given', $type));
}
if ($options === null) {
// LIBXML_HTML_NODEFDTD - prevents a default doctype being added when one is not found
$options = LIBXML_HTML_NODEFDTD;
}
if ( ! is_int($options)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 4 to be integer, %s given', __METHOD__, (is_object($options) ? get_class($options) : gettype($options))));
}
$string = trim($string);
if ($isFile) {
$string = $this->loadFile($string);
}
if (strtolower($type) === Document::TYPE_HTML) {
$string = Encoder::convertToHtmlEntities($string, $this->encoding);
}
$this->type = strtolower($type);
Errors::disable();
if ($this->type === Document::TYPE_HTML) {
$this->document->loadHtml($string, $options);
} else {
$this->document->loadXml($string, $options);
}
Errors::restore();
return $this;
}
/**
* Load HTML from a string.
*
* @param string $html The HTML string
* @param int|null $options Additional parameters
*
* @return Document
*
* @throws InvalidArgumentException if parameter 1 is not a string
*/
public function loadHtml($html, $options = null)
{
return $this->load($html, false, Document::TYPE_HTML, $options);
}
/**
* Load HTML from a file.
*
* @param string $filename The path to the HTML file
* @param int|null $options Additional parameters
*
* @return Document
*
* @throws InvalidArgumentException if parameter 1 not a string
* @throws RuntimeException if the file doesn't exist
* @throws RuntimeException if you are unable to load the file
*/
public function loadHtmlFile($filename, $options = null)
{
return $this->load($filename, true, Document::TYPE_HTML, $options);
}
/**
* Load XML from a string.
*
* @param string $xml The XML string
* @param int|null $options Additional parameters
*
* @return Document
*
* @throws InvalidArgumentException if parameter 1 is not a string
*/
public function loadXml($xml, $options = null)
{
return $this->load($xml, false, Document::TYPE_XML, $options);
}
/**
* Load XML from a file.
*
* @param string $filename The path to the XML file
* @param int|null $options Additional parameters
*
* @return Document
*
* @throws InvalidArgumentException if the file path is not a string
* @throws RuntimeException if the file doesn't exist
* @throws RuntimeException if you are unable to load the file
*/
public function loadXmlFile($filename, $options = null)
{
return $this->load($filename, true, Document::TYPE_XML, $options);
}
/**
* Reads entire file into a string.
*
* @param string $filename The path to the file
*
* @return string
*
* @throws InvalidArgumentException if parameter 1 is not a string
* @throws RuntimeException if an error occurred
*/
protected function loadFile($filename)
{
if ( ! is_string($filename)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, gettype($filename)));
}
try {
$content = file_get_contents($filename);
} catch (Exception $exception) {
throw new RuntimeException(sprintf('Could not load file %s', $filename));
}
if ($content === false) {
throw new RuntimeException(sprintf('Could not load file %s', $filename));
}
return $content;
}
/**
* Checks the existence of the node.
*
* @param string $expression XPath expression or CSS selector
* @param string $type The type of the expression
*
* @return bool
*/
public function has($expression, $type = Query::TYPE_CSS)
{
$expression = Query::compile($expression, $type);
$expression = sprintf('count(%s) > 0', $expression);
return $this->createXpath()->evaluate($expression);
}
/**
* Searches for a node in the DOM tree for a given XPath expression or CSS selector.
*
* @param string $expression XPath expression or a CSS selector
* @param string $type The type of the expression
* @param bool $wrapNode Returns array of Element if true, otherwise array of DOMElement
* @param DOMElement|null $contextNode The node in which the search will be performed
*
* @return Element[]|DOMElement[]
*
* @throws InvalidSelectorException if the selector is invalid
* @throws InvalidArgumentException if context node is not DOMElement
*/
public function find($expression, $type = Query::TYPE_CSS, $wrapNode = true, $contextNode = null)
{
$expression = Query::compile($expression, $type);
if ($contextNode !== null) {
if ($contextNode instanceof Element) {
$contextNode = $contextNode->getNode();
}
if ( ! $contextNode instanceof DOMElement) {
throw new InvalidArgumentException(sprintf('Argument 4 passed to %s must be an instance of %s\Element or DOMElement, %s given', __METHOD__, __NAMESPACE__, (is_object($contextNode) ? get_class($contextNode) : gettype($contextNode))));
}
if ($type === Query::TYPE_CSS) {
$expression = '.' . $expression;
}
}
$nodeList = $this->createXpath()->query($expression, $contextNode);
$result = [];
if ($wrapNode) {
foreach ($nodeList as $node) {
$result[] = $this->wrapNode($node);
}
} else {
foreach ($nodeList as $node) {
$result[] = $node;
}
}
return $result;
}
/**
* Searches for a node in the DOM tree and returns first element or null.
*
* @param string $expression XPath expression or a CSS selector
* @param string $type The type of the expression
* @param bool $wrapNode Returns array of Element if true, otherwise array of DOMElement
* @param DOMElement|null $contextNode The node in which the search will be performed
*
* @return Element|DOMElement|null
*
* @throws InvalidSelectorException if the selector is invalid
*/
public function first($expression, $type = Query::TYPE_CSS, $wrapNode = true, $contextNode = null)
{
$expression = Query::compile($expression, $type);
if ($contextNode !== null && $type === Query::TYPE_CSS) {
$expression = '.' . $expression;
}
$expression = sprintf('(%s)[1]', $expression);
$nodes = $this->find($expression, Query::TYPE_XPATH, false, $contextNode);
if (count($nodes) === 0) {
return null;
}
return $wrapNode ? $this->wrapNode($nodes[0]) : $nodes[0];
}
/**
* @param DOMElement|DOMText|DOMAttr $node
*
* @return Element|string
*
* @throws InvalidArgumentException if parameter 1 is not an instance of DOMElement, DOMText, DOMComment, DOMCdataSection or DOMAttr
*/
protected function wrapNode($node)
{
switch (get_class($node)) {
case 'DOMElement':
case 'DOMComment':
case 'DOMCdataSection':
return new Element($node);
case 'DOMText':
return $node->data;
case 'DOMAttr':
return $node->value;
}
throw new InvalidArgumentException(sprintf('Unknown node type "%s"', get_class($node)));
}
/**
* Searches for a node in the DOM tree for a given XPath expression.
*
* @param string $expression XPath expression
* @param bool $wrapNode Returns array of Element if true, otherwise array of DOMElement
* @param DOMElement $contextNode The node in which the search will be performed
*
* @return Element[]|DOMElement[]
*/
public function xpath($expression, $wrapNode = true, $contextNode = null)
{
return $this->find($expression, Query::TYPE_XPATH, $wrapNode, $contextNode);
}
/**
* Counts nodes for a given XPath expression or CSS selector.
*
* @param string $expression XPath expression or CSS selector
* @param string $type The type of the expression
*
* @return int
*
* @throws InvalidSelectorException
*/
public function count($expression, $type = Query::TYPE_CSS)
{
$expression = Query::compile($expression, $type);
$expression = sprintf('count(%s)', $expression);
return (int) $this->createXpath()->evaluate($expression);
}
/**
* @return DOMXPath
*/
public function createXpath()
{
$xpath = new DOMXPath($this->document);
foreach ($this->namespaces as $prefix => $namespace) {
$xpath->registerNamespace($prefix, $namespace);
}
$xpath->registerPhpFunctions();
return $xpath;
}
/**
* Register a namespace.
*
* @param string $prefix
* @param string $namespace
*/
public function registerNamespace($prefix, $namespace)
{
if ( ! is_string($prefix)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 2 to be string, %s given', __METHOD__, (is_object($prefix) ? get_class($prefix) : gettype($prefix))));
}
if ( ! is_string($namespace)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 2 to be string, %s given', __METHOD__, (is_object($namespace) ? get_class($namespace) : gettype($namespace))));
}
$this->namespaces[$prefix] = $namespace;
}
/**
* Dumps the internal document into a string using HTML formatting.
*
* @return string The document html
*/
public function html()
{
return trim($this->document->saveHTML($this->document));
}
/**
* Dumps the internal document into a string using XML formatting.
*
* @param int $options Additional options
*
* @return string The document xml
*/
public function xml($options = 0)
{
return trim($this->document->saveXML($this->document, $options));
}
/**
* Nicely formats output with indentation and extra space.
*
* @param bool $format Formats output if true
*
* @return Document
*/
public function format($format = true)
{
if ( ! is_bool($format)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be boolean, %s given', __METHOD__, gettype($format)));
}
$this->document->formatOutput = $format;
return $this;
}
/**
* Get the text content of this node and its descendants.
*
* @return string
*/
public function text()
{
return $this->getElement()->textContent;
}
/**
* Indicates if two documents are the same document.
*
* @param Document|DOMDocument $document The compared document
*
* @return bool
*
* @throws InvalidArgumentException if parameter 1 is not an instance of DOMDocument or Document
*/
public function is($document)
{
if ($document instanceof Document) {
$element = $document->getElement();
} else {
if ( ! $document instanceof DOMDocument) {
throw new InvalidArgumentException(sprintf('Argument 1 passed to %s must be an instance of %s or DOMDocument, %s given', __METHOD__, __CLASS__, (is_object($document) ? get_class($document) : gettype($document))));
}
$element = $document->documentElement;
}
if ($element === null) {
return false;
}
return $this->getElement()->isSameNode($element);
}
/**
* Returns the type of the document (XML or HTML).
*
* @return string
*/
public function getType()
{
return $this->type;
}
/**
* Returns the encoding of the document.
*
* @return string
*/
public function getEncoding()
{
return $this->encoding;
}
/**
* @return DOMDocument
*/
public function getDocument()
{
return $this->document;
}
/**
* @return DOMElement
*/
public function getElement()
{
return $this->document->documentElement;
}
/**
* @return Element
*/
public function toElement()
{
if ($this->document->documentElement === null) {
throw new RuntimeException('Cannot convert empty document to Element');
}
return new Element($this->document->documentElement);
}
/**
* Convert the document to its string representation.
*
* @return string
*/
public function __toString()
{
return $this->type === Document::TYPE_HTML ? $this->html() : $this->xml();
}
/**
* Searches for a node in the DOM tree for a given XPath expression or CSS selector.
*
* @param string $expression XPath expression or a CSS selector
* @param string $type The type of the expression
* @param bool $wrapNode Returns array of Element if true, otherwise array of DOMElement
* @param DOMElement|null $contextNode The node in which the search will be performed
*
* @return Element[]|DOMElement[]
*
* @throws InvalidSelectorException
*
* @deprecated Not longer recommended, use Document::find() instead.
*/
public function __invoke($expression, $type = Query::TYPE_CSS, $wrapNode = true, $contextNode = null)
{
return $this->find($expression, $type, $wrapNode, $contextNode);
}
}
src/DiDom/DocumentFragment.php 0000644 00000001456 15050224513 0012305 0 ustar 00 setNode($documentFragment);
}
/**
* Append raw XML data.
*
* @param string $data
*/
public function appendXml($data)
{
$this->node->appendXML($data);
}
}
src/DiDom/Errors.php 0000644 00000001535 15050224513 0010315 0 ustar 00 createElement($tagName);
$this->setNode($node);
} else {
$this->setNode($tagName);
}
if ($value !== null) {
$this->setValue($value);
}
foreach ($attributes as $attrName => $attrValue) {
$this->setAttribute($attrName, $attrValue);
}
}
/**
* Creates a new element.
*
* @param DOMNode|string $name The tag name of an element
* @param string|null $value The value of an element
* @param array $attributes The attributes of an element
*
* @return Element
*/
public static function create($name, $value = null, array $attributes = [])
{
return new Element($name, $value, $attributes);
}
/**
* Creates a new element node by CSS selector.
*
* @param string $selector
* @param string|null $value
* @param array $attributes
*
* @return Element
*
* @throws InvalidSelectorException
*/
public static function createBySelector($selector, $value = null, array $attributes = [])
{
return Document::create()->createElementBySelector($selector, $value, $attributes);
}
/**
* Checks that the node matches selector.
*
* @param string $selector CSS selector
* @param bool $strict
*
* @return bool
*
* @throws InvalidSelectorException if the selector is invalid
* @throws InvalidArgumentException if the tag name is not a string
* @throws RuntimeException if the tag name is not specified in strict mode
*/
public function matches($selector, $strict = false)
{
if ( ! is_string($selector)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, gettype($selector)));
}
if ( ! $this->node instanceof DOMElement) {
return false;
}
if ($selector === '*') {
return true;
}
if ( ! $strict) {
$innerHtml = $this->html();
$html = "$innerHtml";
$selector = 'root > ' . trim($selector);
$document = new Document();
$document->loadHtml($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
return $document->has($selector);
}
$segments = Query::getSegments($selector);
if ( ! array_key_exists('tag', $segments)) {
throw new RuntimeException(sprintf('Tag name must be specified in %s', $selector));
}
if ($segments['tag'] !== $this->tag && $segments['tag'] !== '*') {
return false;
}
$segments['id'] = array_key_exists('id', $segments) ? $segments['id'] : null;
if ($segments['id'] !== $this->getAttribute('id')) {
return false;
}
$classes = $this->hasAttribute('class') ? explode(' ', trim($this->getAttribute('class'))) : [];
$segments['classes'] = array_key_exists('classes', $segments) ? $segments['classes'] : [];
$diff1 = array_diff($segments['classes'], $classes);
$diff2 = array_diff($classes, $segments['classes']);
if (count($diff1) > 0 || count($diff2) > 0) {
return false;
}
$attributes = $this->attributes();
unset($attributes['id'], $attributes['class']);
$segments['attributes'] = array_key_exists('attributes', $segments) ? $segments['attributes'] : [];
$diff1 = array_diff_assoc($segments['attributes'], $attributes);
$diff2 = array_diff_assoc($attributes, $segments['attributes']);
// if the attributes are not equal
if (count($diff1) > 0 || count($diff2) > 0) {
return false;
}
return true;
}
/**
* Determine if an attribute exists on the element.
*
* @param string $name The name of an attribute
*
* @return bool
*/
public function hasAttribute($name)
{
return $this->node->hasAttribute($name);
}
/**
* Set an attribute on the element.
*
* @param string $name The name of an attribute
* @param string $value The value of an attribute
*
* @return Element
*/
public function setAttribute($name, $value)
{
if (is_numeric($value)) {
$value = (string) $value;
}
if ( ! is_string($value) && $value !== null) {
throw new InvalidArgumentException(sprintf('%s expects parameter 2 to be string or null, %s given', __METHOD__, (is_object($value) ? get_class($value) : gettype($value))));
}
$this->node->setAttribute($name, $value);
return $this;
}
/**
* Access to the element's attributes.
*
* @param string $name The name of an attribute
* @param string|null $default The value returned if the attribute doesn't exist
*
* @return string|null The value of an attribute or null if attribute doesn't exist
*/
public function getAttribute($name, $default = null)
{
if ($this->hasAttribute($name)) {
return $this->node->getAttribute($name);
}
return $default;
}
/**
* Unset an attribute on the element.
*
* @param string $name The name of an attribute
*
* @return Element
*/
public function removeAttribute($name)
{
$this->node->removeAttribute($name);
return $this;
}
/**
* Unset all attributes of the element.
*
* @param string[] $exclusions
*
* @return Element
*/
public function removeAllAttributes(array $exclusions = [])
{
if ( ! $this->node instanceof DOMElement) {
return $this;
}
foreach ($this->attributes() as $name => $value) {
if (in_array($name, $exclusions, true)) {
continue;
}
$this->node->removeAttribute($name);
}
return $this;
}
/**
* Alias for getAttribute and setAttribute methods.
*
* @param string $name The name of an attribute
* @param string|null $value The value that will be returned an attribute doesn't exist
*
* @return string|null|Element
*/
public function attr($name, $value = null)
{
if ($value === null) {
return $this->getAttribute($name);
}
return $this->setAttribute($name, $value);
}
/**
* Returns the node attributes or null, if it is not DOMElement.
*
* @param string[] $names
*
* @return array|null
*/
public function attributes(array $names = null)
{
if ( ! $this->node instanceof DOMElement) {
return null;
}
if ($names === null) {
$result = [];
foreach ($this->node->attributes as $name => $attribute) {
$result[$name] = $attribute->value;
}
return $result;
}
$result = [];
foreach ($this->node->attributes as $name => $attribute) {
if (in_array($name, $names, true)) {
$result[$name] = $attribute->value;
}
}
return $result;
}
/**
* @return ClassAttribute
*
* @throws LogicException if the node is not an instance of DOMElement
*/
public function classes()
{
if ($this->classAttribute !== null) {
return $this->classAttribute;
}
if ( ! $this->isElementNode()) {
throw new LogicException('Class attribute is available only for element nodes');
}
$this->classAttribute = new ClassAttribute($this);
return $this->classAttribute;
}
/**
* @return StyleAttribute
*
* @throws LogicException if the node is not an instance of DOMElement
*/
public function style()
{
if ($this->styleAttribute !== null) {
return $this->styleAttribute;
}
if ( ! $this->isElementNode()) {
throw new LogicException('Style attribute is available only for element nodes');
}
$this->styleAttribute = new StyleAttribute($this);
return $this->styleAttribute;
}
/**
* Dynamically set an attribute on the element.
*
* @param string $name The name of an attribute
* @param string $value The value of an attribute
*
* @return Element
*/
public function __set($name, $value)
{
return $this->setAttribute($name, $value);
}
/**
* Dynamically access the element's attributes.
*
* @param string $name The name of an attribute
*
* @return string|null
*/
public function __get($name)
{
if ($name === 'tag') {
return $this->node->tagName;
}
return $this->getAttribute($name);
}
/**
* Determine if an attribute exists on the element.
*
* @param string $name The attribute name
*
* @return bool
*/
public function __isset($name)
{
return $this->hasAttribute($name);
}
/**
* Unset an attribute on the model.
*
* @param string $name The name of an attribute
*/
public function __unset($name)
{
$this->removeAttribute($name);
}
}
src/DiDom/Exceptions/InvalidSelectorException.php 0000644 00000000160 15050224513 0016121 0 ustar 00 $codes[$characterIndex]) {
$entities .= chr($codes[$characterIndex++]);
continue;
}
if (0xF0 <= $codes[$characterIndex]) {
$code = (($codes[$characterIndex++] - 0xF0) << 18) + (($codes[$characterIndex++] - 0x80) << 12) + (($codes[$characterIndex++] - 0x80) << 6) + $codes[$characterIndex++] - 0x80;
} elseif (0xE0 <= $codes[$characterIndex]) {
$code = (($codes[$characterIndex++] - 0xE0) << 12) + (($codes[$characterIndex++] - 0x80) << 6) + $codes[$characterIndex++] - 0x80;
} else {
$code = (($codes[$characterIndex++] - 0xC0) << 6) + $codes[$characterIndex++] - 0x80;
}
$entities .= '' . $code . ';';
}
return $entities;
}
}
src/DiDom/StyleAttribute.php 0000644 00000020560 15050224513 0012024 0 ustar 00 isElementNode()) {
throw new InvalidArgumentException(sprintf('The element must contain DOMElement node'));
}
$this->element = $element;
$this->parseStyleAttribute();
}
/**
* Parses style attribute of the element.
*/
protected function parseStyleAttribute()
{
if ( ! $this->element->hasAttribute('style')) {
// possible if style attribute has been removed
if ($this->styleString !== '') {
$this->styleString = '';
$this->properties = [];
}
return;
}
// if style attribute is not changed
if ($this->element->getAttribute('style') === $this->styleString) {
return;
}
// save style attribute as is (without trimming)
$this->styleString = $this->element->getAttribute('style');
$styleString = trim($this->styleString, ' ;');
if ($styleString === '') {
$this->properties = [];
return;
}
$properties = explode(';', $styleString);
foreach ($properties as $property) {
list($name, $value) = explode(':', $property, 2);
$name = trim($name);
$value = trim($value);
$this->properties[$name] = $value;
}
}
/**
* Updates style attribute of the element.
*/
protected function updateStyleAttribute()
{
$this->styleString = $this->buildStyleString();
$this->element->setAttribute('style', $this->styleString);
}
/**
* @return string
*/
protected function buildStyleString()
{
$properties = [];
foreach ($this->properties as $propertyName => $value) {
$properties[] = $propertyName . ': ' . $value;
}
return implode('; ', $properties);
}
/**
* @param string $name
* @param string $value
*
* @return StyleAttribute
*
* @throws InvalidArgumentException if property name is not a string
* @throws InvalidArgumentException if property value is not a string
*/
public function setProperty($name, $value)
{
if ( ! is_string($name)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, (is_object($name) ? get_class($name) : gettype($name))));
}
if ( ! is_string($value)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 2 to be string, %s given', __METHOD__, (is_object($value) ? get_class($value) : gettype($value))));
}
$this->parseStyleAttribute();
$this->properties[$name] = $value;
$this->updateStyleAttribute();
return $this;
}
/**
* @param array $properties
*
* @return StyleAttribute
*
* @throws InvalidArgumentException if property name is not a string
* @throws InvalidArgumentException if property value is not a string
*/
public function setMultipleProperties(array $properties)
{
$this->parseStyleAttribute();
foreach ($properties as $propertyName => $value) {
if ( ! is_string($propertyName)) {
throw new InvalidArgumentException(sprintf('Property name must be a string, %s given', (is_object($propertyName) ? get_class($propertyName) : gettype($propertyName))));
}
if ( ! is_string($value)) {
throw new InvalidArgumentException(sprintf('Property value must be a string, %s given', (is_object($value) ? get_class($value) : gettype($value))));
}
$this->properties[$propertyName] = $value;
}
$this->updateStyleAttribute();
return $this;
}
/**
* @param string $name
* @param mixed $default
*
* @return mixed
*/
public function getProperty($name, $default = null)
{
if ( ! is_string($name)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, (is_object($name) ? get_class($name) : gettype($name))));
}
$this->parseStyleAttribute();
if ( ! array_key_exists($name, $this->properties)) {
return $default;
}
return $this->properties[$name];
}
/**
* @param array $propertyNames
*
* @return mixed
*
* @throws InvalidArgumentException if property name is not a string
*/
public function getMultipleProperties(array $propertyNames)
{
$this->parseStyleAttribute();
$result = [];
foreach ($propertyNames as $propertyName) {
if ( ! is_string($propertyName)) {
throw new InvalidArgumentException(sprintf('Property name must be a string, %s given', (is_object($propertyName) ? get_class($propertyName) : gettype($propertyName))));
}
if (array_key_exists($propertyName, $this->properties)) {
$result[$propertyName] = $this->properties[$propertyName];
}
}
return $result;
}
/**
* @return array
*/
public function getAllProperties()
{
$this->parseStyleAttribute();
return $this->properties;
}
/**
* @param string $name
*
* @return bool
*/
public function hasProperty($name)
{
if ( ! is_string($name)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, (is_object($name) ? get_class($name) : gettype($name))));
}
$this->parseStyleAttribute();
return array_key_exists($name, $this->properties);
}
/**
* @param string $name
*
* @return StyleAttribute
*
* @throws InvalidArgumentException if property name is not a string
*/
public function removeProperty($name)
{
if ( ! is_string($name)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, (is_object($name) ? get_class($name) : gettype($name))));
}
$this->parseStyleAttribute();
unset($this->properties[$name]);
$this->updateStyleAttribute();
return $this;
}
/**
* @param array $propertyNames
*
* @return StyleAttribute
*
* @throws InvalidArgumentException if property name is not a string
*/
public function removeMultipleProperties(array $propertyNames)
{
$this->parseStyleAttribute();
foreach ($propertyNames as $propertyName) {
if ( ! is_string($propertyName)) {
throw new InvalidArgumentException(sprintf('Property name must be a string, %s given', (is_object($propertyName) ? get_class($propertyName) : gettype($propertyName))));
}
unset($this->properties[$propertyName]);
}
$this->updateStyleAttribute();
return $this;
}
/**
* @param string[] $exclusions
*
* @return StyleAttribute
*/
public function removeAllProperties(array $exclusions = [])
{
$this->parseStyleAttribute();
$preservedProperties = [];
foreach ($exclusions as $propertyName) {
if ( ! is_string($propertyName)) {
throw new InvalidArgumentException(sprintf('Property name must be a string, %s given', (is_object($propertyName) ? get_class($propertyName) : gettype($propertyName))));
}
if ( ! array_key_exists($propertyName, $this->properties)) {
continue;
}
$preservedProperties[$propertyName] = $this->properties[$propertyName];
}
$this->properties = $preservedProperties;
$this->updateStyleAttribute();
return $this;
}
/**
* @return Element
*/
public function getElement()
{
return $this->element;
}
}
src/DiDom/Node.php 0000644 00000102736 15050224513 0007733 0 ustar 00 node->ownerDocument === null) {
throw new LogicException('Can not prepend a child to element without the owner document');
}
$returnArray = true;
if ( ! is_array($nodes)) {
$nodes = [$nodes];
$returnArray = false;
}
$nodes = array_reverse($nodes);
$result = [];
$referenceNode = $this->node->firstChild;
foreach ($nodes as $node) {
$result[] = $this->insertBefore($node, $referenceNode);
$referenceNode = $this->node->firstChild;
}
return $returnArray ? $result : $result[0];
}
/**
* Adds a new child at the end of the children.
*
* @param Node|DOMNode|array $nodes The appended child
*
* @return Element|Element[]
*
* @throws LogicException if the current node has no owner document
* @throws InvalidArgumentException if the provided argument is not an instance of DOMNode or Element
*/
public function appendChild($nodes)
{
if ($this->node->ownerDocument === null) {
throw new LogicException('Can not append a child to element without the owner document');
}
$returnArray = true;
if ( ! is_array($nodes)) {
$nodes = [$nodes];
$returnArray = false;
}
$result = [];
Errors::disable();
foreach ($nodes as $node) {
if ($node instanceof Node) {
$node = $node->getNode();
}
if ( ! $node instanceof DOMNode) {
throw new InvalidArgumentException(sprintf('Argument 1 passed to %s must be an instance of %s or DOMNode, %s given', __METHOD__, __CLASS__, (is_object($node) ? get_class($node) : gettype($node))));
}
$clonedNode = $node->cloneNode(true);
$newNode = $this->node->ownerDocument->importNode($clonedNode, true);
$result[] = $this->node->appendChild($newNode);
}
Errors::restore();
$result = array_map(function (DOMNode $node) {
return new Element($node);
}, $result);
return $returnArray ? $result : $result[0];
}
/**
* Adds a new child before a reference node.
*
* @param Node|DOMNode $node The new node
* @param Element|DOMNode|null $referenceNode The reference node
*
* @return Element
*
* @throws LogicException if the current node has no owner document
* @throws InvalidArgumentException if $node is not an instance of DOMNode or Element
* @throws InvalidArgumentException if $referenceNode is not an instance of DOMNode or Element
*/
public function insertBefore($node, $referenceNode = null)
{
if ($this->node->ownerDocument === null) {
throw new LogicException('Can not insert a child to an element without the owner document');
}
if ($node instanceof Node) {
$node = $node->getNode();
}
if ( ! $node instanceof DOMNode) {
throw new InvalidArgumentException(sprintf('Argument 1 passed to %s must be an instance of %s or DOMNode, %s given', __METHOD__, __CLASS__, (is_object($node) ? get_class($node) : gettype($node))));
}
if ($referenceNode !== null) {
if ($referenceNode instanceof Element) {
$referenceNode = $referenceNode->getNode();
}
if ( ! $referenceNode instanceof DOMNode) {
throw new InvalidArgumentException(sprintf('Argument 2 passed to %s must be an instance of %s or DOMNode, %s given', __METHOD__, __CLASS__, (is_object($referenceNode) ? get_class($referenceNode) : gettype($referenceNode))));
}
}
Errors::disable();
$clonedNode = $node->cloneNode(true);
$newNode = $this->node->ownerDocument->importNode($clonedNode, true);
$insertedNode = $this->node->insertBefore($newNode, $referenceNode);
Errors::restore();
return new Element($insertedNode);
}
/**
* Adds a new child after a reference node.
*
* @param Node|DOMNode $node The new node
* @param Element|DOMNode|null $referenceNode The reference node
*
* @return Element
*
* @throws LogicException if the current node has no owner document
* @throws InvalidArgumentException if $node is not an instance of DOMNode or Element
* @throws InvalidArgumentException if $referenceNode is not an instance of DOMNode or Element
*/
public function insertAfter($node, $referenceNode = null)
{
if ($referenceNode === null) {
return $this->insertBefore($node);
}
if ($referenceNode instanceof Node) {
$referenceNode = $referenceNode->getNode();
}
if ( ! $referenceNode instanceof DOMNode) {
throw new InvalidArgumentException(sprintf('Argument 2 passed to %s must be an instance of %s or DOMNode, %s given', __METHOD__, __CLASS__, (is_object($referenceNode) ? get_class($referenceNode) : gettype($referenceNode))));
}
return $this->insertBefore($node, $referenceNode->nextSibling);
}
/**
* Adds a new sibling before a reference node.
*
* @param Node|DOMNode $node The new node
*
* @return Element
*
* @throws LogicException if the current node has no owner document
* @throws InvalidArgumentException if $node is not an instance of DOMNode or Element
* @throws InvalidArgumentException if $referenceNode is not an instance of DOMNode or Element
*/
public function insertSiblingBefore($node)
{
if ($this->node->ownerDocument === null) {
throw new LogicException('Can not insert a child to an element without the owner document');
}
if ($this->parent() === null) {
throw new LogicException('Can not insert a child to an element without the parent element');
}
if ($node instanceof Node) {
$node = $node->getNode();
}
if ( ! $node instanceof DOMNode) {
throw new InvalidArgumentException(sprintf('Argument 1 passed to %s must be an instance of %s or DOMNode, %s given', __METHOD__, __CLASS__, (is_object($node) ? get_class($node) : gettype($node))));
}
Errors::disable();
$clonedNode = $node->cloneNode(true);
$newNode = $this->node->ownerDocument->importNode($clonedNode, true);
$insertedNode = $this->parent()->getNode()->insertBefore($newNode, $this->node);
Errors::restore();
return new Element($insertedNode);
}
/**
* Adds a new sibling after a reference node.
*
* @param Node|DOMNode $node The new node
*
* @return Element
*
* @throws LogicException if the current node has no owner document
* @throws InvalidArgumentException if $node is not an instance of DOMNode or Element
* @throws InvalidArgumentException if $referenceNode is not an instance of DOMNode or Element
*/
public function insertSiblingAfter($node)
{
if ($this->node->ownerDocument === null) {
throw new LogicException('Can not insert a child to an element without the owner document');
}
if ($this->parent() === null) {
throw new LogicException('Can not insert a child to an element without the parent element');
}
$nextSibling = $this->nextSibling();
// if the current node is the last child
if ($nextSibling === null) {
return $this->parent()->appendChild($node);
}
return $nextSibling->insertSiblingBefore($node);
}
/**
* Checks the existence of the node.
*
* @param string $expression XPath expression or CSS selector
* @param string $type The type of the expression
*
* @return bool
*/
public function has($expression, $type = Query::TYPE_CSS)
{
return $this->toDocument()->has($expression, $type);
}
/**
* Searches for a node in the DOM tree for a given XPath expression or CSS selector.
*
* @param string $expression XPath expression or CSS selector
* @param string $type The type of the expression
* @param bool $wrapElement Returns array of Element if true, otherwise array of DOMElement
*
* @return Element[]|DOMElement[]
*
* @throws InvalidSelectorException
*/
public function find($expression, $type = Query::TYPE_CSS, $wrapElement = true)
{
return $this->toDocument()->find($expression, $type, $wrapElement);
}
/**
* Searches for a node in the owner document using current node as context.
*
* @param string $expression XPath expression or CSS selector
* @param string $type The type of the expression
* @param bool $wrapNode Returns array of Element if true, otherwise array of DOMElement
*
* @return Element[]|DOMElement[]
*
* @throws LogicException if the current node has no owner document
* @throws InvalidSelectorException
*/
public function findInDocument($expression, $type = Query::TYPE_CSS, $wrapNode = true)
{
$ownerDocument = $this->getDocument();
if ($ownerDocument === null) {
throw new LogicException('Can not search in context without the owner document');
}
return $ownerDocument->find($expression, $type, $wrapNode, $this->node);
}
/**
* Searches for a node in the DOM tree and returns first element or null.
*
* @param string $expression XPath expression or CSS selector
* @param string $type The type of the expression
* @param bool $wrapNode Returns Element if true, otherwise DOMElement
*
* @return Element|DOMElement|null
*
* @throws InvalidSelectorException
*/
public function first($expression, $type = Query::TYPE_CSS, $wrapNode = true)
{
return $this->toDocument()->first($expression, $type, $wrapNode);
}
/**
* Searches for a node in the owner document using current node as context and returns first element or null.
*
* @param string $expression XPath expression or CSS selector
* @param string $type The type of the expression
* @param bool $wrapNode Returns Element if true, otherwise DOMElement
*
* @return Element|DOMElement|null
*
* @throws InvalidSelectorException
*/
public function firstInDocument($expression, $type = Query::TYPE_CSS, $wrapNode = true)
{
$ownerDocument = $this->getDocument();
if ($ownerDocument === null) {
throw new LogicException('Can not search in context without the owner document');
}
return $ownerDocument->first($expression, $type, $wrapNode, $this->node);
}
/**
* Searches for a node in the DOM tree for a given XPath expression.
*
* @param string $expression XPath expression
* @param bool $wrapNode Returns array of Element if true, otherwise array of DOMElement
*
* @return Element[]|DOMElement[]
*
* @throws InvalidSelectorException
*/
public function xpath($expression, $wrapNode = true)
{
return $this->find($expression, Query::TYPE_XPATH, $wrapNode);
}
/**
* Counts nodes for a given XPath expression or CSS selector.
*
* @param string $expression XPath expression or CSS selector
* @param string $type The type of the expression
*
* @return int
*
* @throws InvalidSelectorException
*/
public function count($expression, $type = Query::TYPE_CSS)
{
return $this->toDocument()->count($expression, $type);
}
/**
* Dumps the node into a string using HTML formatting (including child nodes).
*
* @return string
*/
public function html()
{
return $this->toDocument()->html();
}
/**
* Dumps the node into a string using HTML formatting (without child nodes).
*
* @return string
*/
public function outerHtml()
{
$document = new DOMDocument();
$importedNode = $document->importNode($this->node);
return $document->saveHTML($importedNode);
}
/**
* Dumps the node descendants into a string using HTML formatting.
*
* @param string $delimiter
*
* @return string
*/
public function innerHtml($delimiter = '')
{
$innerHtml = [];
foreach ($this->node->childNodes as $childNode) {
$innerHtml[] = $childNode->ownerDocument->saveHTML($childNode);
}
return implode($delimiter, $innerHtml);
}
/**
* Dumps the node descendants into a string using XML formatting.
*
* @param string $delimiter
*
* @return string
*/
public function innerXml($delimiter = '')
{
$innerXml = [];
foreach ($this->node->childNodes as $childNode) {
$innerXml[] = $childNode->ownerDocument->saveXML($childNode);
}
return implode($delimiter, $innerXml);
}
/**
* Sets inner HTML.
*
* @param string $html
*
* @return static
*
* @throws InvalidArgumentException if passed argument is not a string
* @throws InvalidSelectorException
*/
public function setInnerHtml($html)
{
if ( ! is_string($html)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, (is_object($html) ? get_class($html) : gettype($html))));
}
$this->removeChildren();
if ($html !== '') {
Errors::disable();
$html = "$html";
$document = new Document($html);
$fragment = $document->first('htmlfragment')->getNode();
foreach ($fragment->childNodes as $node) {
$newNode = $this->node->ownerDocument->importNode($node, true);
$this->node->appendChild($newNode);
}
Errors::restore();
}
return $this;
}
/**
* Dumps the node into a string using XML formatting.
*
* @param int $options Additional options
*
* @return string The node XML
*/
public function xml($options = 0)
{
return $this->toDocument()->xml($options);
}
/**
* Get the text content of this node and its descendants.
*
* @return string The node value
*/
public function text()
{
return $this->node->textContent;
}
/**
* Set the value of this node.
*
* @param string $value The new value of the node
*
* @return static
*
* @throws InvalidArgumentException if parameter 1 is not a string
*/
public function setValue($value)
{
if (is_numeric($value)) {
$value = (string) $value;
}
if ( ! is_string($value) && $value !== null) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, (is_object($value) ? get_class($value) : gettype($value))));
}
$this->node->nodeValue = $value;
return $this;
}
/**
* Returns true if the current node is a DOMElement instance.
*
* @return bool
*/
public function isElementNode()
{
return $this->node instanceof DOMElement;
}
/**
* Returns true if the current node is a a DOMText instance.
*
* @return bool
*/
public function isTextNode()
{
return $this->node instanceof DOMText;
}
/**
* Returns true if the current node is a DOMComment instance.
*
* @return bool
*/
public function isCommentNode()
{
return $this->node instanceof DOMComment;
}
/**
* Returns true if the current node is a DOMCdataSection instance.
*
* @return bool
*/
public function isCdataSectionNode()
{
return $this->node instanceof DOMCdataSection;
}
/**
* Indicates if two nodes are the same node.
*
* @param Element|DOMNode $node
*
* @return bool
*
* @throws InvalidArgumentException if parameter 1 is not an instance of DOMNode
*/
public function is($node)
{
if ($node instanceof Node) {
$node = $node->getNode();
}
if ( ! $node instanceof DOMNode) {
throw new InvalidArgumentException(sprintf('Argument 1 passed to %s must be an instance of %s or DOMNode, %s given', __METHOD__, __CLASS__, (is_object($node) ? get_class($node) : gettype($node))));
}
return $this->node->isSameNode($node);
}
/**
* @return Element|Document|null
*/
public function parent()
{
if ($this->node->parentNode === null) {
return null;
}
if ($this->node->parentNode instanceof DOMDocument) {
return new Document($this->node->parentNode);
}
return new Element($this->node->parentNode);
}
/**
* Returns first parent node matches passed selector.
*
* @param string $selector
* @param bool $strict
*
* @return Element|null
*
* @throws InvalidSelectorException if the selector is invalid
*/
public function closest($selector, $strict = false)
{
$node = $this;
while (true) {
$parent = $node->parent();
if ($parent === null || $parent instanceof Document) {
return null;
}
if ($parent->matches($selector, $strict)) {
return $parent;
}
$node = $parent;
}
return null;
}
/**
* @param string|null $selector
* @param string|null $nodeType
*
* @return Element|null
*
* @throws InvalidArgumentException if parameter 2 is not a string
* @throws RuntimeException if the node type is invalid
* @throws LogicException if the selector used with non DOMElement node type
* @throws InvalidSelectorException if the selector is invalid
*/
public function previousSibling($selector = null, $nodeType = null)
{
if ($this->node->previousSibling === null) {
return null;
}
if ($selector === null && $nodeType === null) {
return new Element($this->node->previousSibling);
}
if ($selector !== null && $nodeType === null) {
$nodeType = 'DOMElement';
}
if ( ! is_string($nodeType)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 2 to be string, %s given', __METHOD__, gettype($nodeType)));
}
$allowedTypes = ['DOMElement', 'DOMText', 'DOMComment', 'DOMCdataSection'];
if ( ! in_array($nodeType, $allowedTypes, true)) {
throw new RuntimeException(sprintf('Unknown node type "%s". Allowed types: %s', $nodeType, implode(', ', $allowedTypes)));
}
if ($selector !== null && $nodeType !== 'DOMElement') {
throw new LogicException(sprintf('Selector can be used only with DOMElement node type, %s given', $nodeType));
}
$node = $this->node->previousSibling;
while ($node !== null) {
if (get_class($node) !== $nodeType) {
$node = $node->previousSibling;
continue;
}
$element = new Element($node);
if ($selector === null || $element->matches($selector)) {
return $element;
}
$node = $node->previousSibling;
}
return null;
}
/**
* @param string|null $selector
* @param string|null $nodeType
*
* @return Element[]
*
* @throws InvalidArgumentException if parameter 2 is not a string
* @throws RuntimeException if the node type is invalid
* @throws LogicException if the selector used with non DOMElement node type
* @throws InvalidSelectorException if the selector is invalid
*/
public function previousSiblings($selector = null, $nodeType = null)
{
if ($this->node->previousSibling === null) {
return [];
}
if ($selector !== null && $nodeType === null) {
$nodeType = 'DOMElement';
}
if ($nodeType !== null) {
if ( ! is_string($nodeType)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 2 to be string, %s given', __METHOD__, gettype($nodeType)));
}
$allowedTypes = ['DOMElement', 'DOMText', 'DOMComment', 'DOMCdataSection'];
if ( ! in_array($nodeType, $allowedTypes, true)) {
throw new RuntimeException(sprintf('Unknown node type "%s". Allowed types: %s', $nodeType, implode(', ', $allowedTypes)));
}
}
if ($selector !== null && $nodeType !== 'DOMElement') {
throw new LogicException(sprintf('Selector can be used only with DOMElement node type, %s given', $nodeType));
}
$result = [];
$node = $this->node->previousSibling;
while ($node !== null) {
$element = new Element($node);
if ($nodeType === null) {
$result[] = $element;
$node = $node->previousSibling;
continue;
}
if (get_class($node) !== $nodeType) {
$node = $node->previousSibling;
continue;
}
if ($selector === null) {
$result[] = $element;
$node = $node->previousSibling;
continue;
}
if ($element->matches($selector)) {
$result[] = $element;
}
$node = $node->previousSibling;
}
return array_reverse($result);
}
/**
* @param string|null $selector
* @param string|null $nodeType
*
* @return Element|null
*
* @throws InvalidArgumentException if parameter 2 is not a string
* @throws RuntimeException if the node type is invalid
* @throws LogicException if the selector used with non DOMElement node type
* @throws InvalidSelectorException if the selector is invalid
*/
public function nextSibling($selector = null, $nodeType = null)
{
if ($this->node->nextSibling === null) {
return null;
}
if ($selector === null && $nodeType === null) {
return new Element($this->node->nextSibling);
}
if ($selector !== null && $nodeType === null) {
$nodeType = 'DOMElement';
}
if ( ! is_string($nodeType)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 2 to be string, %s given', __METHOD__, gettype($nodeType)));
}
$allowedTypes = ['DOMElement', 'DOMText', 'DOMComment', 'DOMCdataSection'];
if ( ! in_array($nodeType, $allowedTypes, true)) {
throw new RuntimeException(sprintf('Unknown node type "%s". Allowed types: %s', $nodeType, implode(', ', $allowedTypes)));
}
if ($selector !== null && $nodeType !== 'DOMElement') {
throw new LogicException(sprintf('Selector can be used only with DOMElement node type, %s given', $nodeType));
}
$node = $this->node->nextSibling;
while ($node !== null) {
if (get_class($node) !== $nodeType) {
$node = $node->nextSibling;
continue;
}
$element = new Element($node);
if ($selector === null || $element->matches($selector)) {
return $element;
}
$node = $node->nextSibling;
}
return null;
}
/**
* @param string|null $selector
* @param string $nodeType
*
* @return Element[]
*
* @throws InvalidArgumentException if parameter 2 is not a string
* @throws RuntimeException if the node type is invalid
* @throws LogicException if the selector used with non DOMElement node type
* @throws InvalidSelectorException if the selector is invalid
*/
public function nextSiblings($selector = null, $nodeType = null)
{
if ($this->node->nextSibling === null) {
return [];
}
if ($selector !== null && $nodeType === null) {
$nodeType = 'DOMElement';
}
if ($nodeType !== null) {
if ( ! is_string($nodeType)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 2 to be string, %s given', __METHOD__, gettype($nodeType)));
}
$allowedTypes = ['DOMElement', 'DOMText', 'DOMComment', 'DOMCdataSection'];
if ( ! in_array($nodeType, $allowedTypes, true)) {
throw new RuntimeException(sprintf('Unknown node type "%s". Allowed types: %s', $nodeType, implode(', ', $allowedTypes)));
}
}
if ($selector !== null && $nodeType !== 'DOMElement') {
throw new LogicException(sprintf('Selector can be used only with DOMElement node type, %s given', $nodeType));
}
$result = [];
$node = $this->node->nextSibling;
while ($node !== null) {
$element = new Element($node);
if ($nodeType === null) {
$result[] = $element;
$node = $node->nextSibling;
continue;
}
if (get_class($node) !== $nodeType) {
$node = $node->nextSibling;
continue;
}
if ($selector === null) {
$result[] = $element;
$node = $node->nextSibling;
continue;
}
if ($element->matches($selector)) {
$result[] = $element;
}
$node = $node->nextSibling;
}
return $result;
}
/**
* @param int $index
*
* @return Element|null
*/
public function child($index)
{
$child = $this->node->childNodes->item($index);
return $child === null ? null : new Element($child);
}
/**
* @return Element|null
*/
public function firstChild()
{
if ($this->node->firstChild === null) {
return null;
}
return new Element($this->node->firstChild);
}
/**
* @return Element|null
*/
public function lastChild()
{
if ($this->node->lastChild === null) {
return null;
}
return new Element($this->node->lastChild);
}
/**
* @return bool
*/
public function hasChildren()
{
return $this->node->hasChildNodes();
}
/**
* @return Element[]
*/
public function children()
{
$children = [];
foreach ($this->node->childNodes as $node) {
$children[] = new Element($node);
}
return $children;
}
/**
* Removes child from list of children.
*
* @param Node|DOMNode $childNode
*
* @return Element the node that has been removed
*/
public function removeChild($childNode)
{
if ($childNode instanceof Node) {
$childNode = $childNode->getNode();
}
if ( ! $childNode instanceof DOMNode) {
throw new InvalidArgumentException(sprintf('Argument 1 passed to %s must be an instance of %s or DOMNode, %s given', __METHOD__, __CLASS__, (is_object($childNode) ? get_class($childNode) : gettype($childNode))));
}
$removedNode = $this->node->removeChild($childNode);
return new Element($removedNode);
}
/**
* Removes all child nodes.
*
* @return Element[] the nodes that has been removed
*/
public function removeChildren()
{
// we need to collect child nodes to array
// because removing nodes from the DOMNodeList on iterating is not working
$childNodes = [];
foreach ($this->node->childNodes as $childNode) {
$childNodes[] = $childNode;
}
$removedNodes = [];
foreach ($childNodes as $childNode) {
$removedNode = $this->node->removeChild($childNode);
$removedNodes[] = new Element($removedNode);
}
return $removedNodes;
}
/**
* Removes current node from the parent.
*
* @return Element the node that has been removed
*
* @throws LogicException if the current node has no parent node
*/
public function remove()
{
if ($this->node->parentNode === null) {
throw new LogicException('Can not remove an element without the parent node');
}
$removedNode = $this->node->parentNode->removeChild($this->node);
return new Element($removedNode);
}
/**
* Replaces a child.
*
* @param Node|DOMNode $newNode The new node
* @param bool $clone Clone the node if true, otherwise move it
*
* @return Element The node that has been replaced
*
* @throws LogicException if the current node has no parent node
*/
public function replace($newNode, $clone = true)
{
if ($this->node->parentNode === null) {
throw new LogicException('Can not replace an element without the parent node');
}
if ($newNode instanceof Node) {
$newNode = $newNode->getNode();
}
if ( ! $newNode instanceof DOMNode) {
throw new InvalidArgumentException(sprintf('Argument 1 passed to %s must be an instance of %s or DOMNode, %s given', __METHOD__, __CLASS__, (is_object($newNode) ? get_class($newNode) : gettype($newNode))));
}
if ($clone) {
$newNode = $newNode->cloneNode(true);
}
if ($newNode->ownerDocument === null || ! $this->getDocument()->is($newNode->ownerDocument)) {
$newNode = $this->node->ownerDocument->importNode($newNode, true);
}
$node = $this->node->parentNode->replaceChild($newNode, $this->node);
return new Element($node);
}
/**
* Get line number for a node.
*
* @return int
*/
public function getLineNo()
{
return $this->node->getLineNo();
}
/**
* Clones a node.
*
* @param bool $deep Indicates whether to copy all descendant nodes
*
* @return Element The cloned node
*/
public function cloneNode($deep = true)
{
return new Element($this->node->cloneNode($deep));
}
/**
* Sets current node instance.
*
* @param DOMElement|DOMText|DOMComment|DOMCdataSection|DOMDocumentFragment $node
*
* @return static
*/
protected function setNode($node)
{
$allowedClasses = ['DOMElement', 'DOMText', 'DOMComment', 'DOMCdataSection', 'DOMDocumentFragment'];
if ( ! is_object($node) || ! in_array(get_class($node), $allowedClasses, true)) {
throw new InvalidArgumentException(sprintf('Argument 1 passed to %s must be an instance of DOMElement, DOMText, DOMComment, DOMCdataSection or DOMDocumentFragment, %s given', __METHOD__, (is_object($node) ? get_class($node) : gettype($node))));
}
$this->node = $node;
return $this;
}
/**
* Returns current node instance.
*
* @return DOMElement|DOMText|DOMComment|DOMCdataSection|DOMDocumentFragment
*/
public function getNode()
{
return $this->node;
}
/**
* Returns the document associated with this node.
*
* @return Document|null
*/
public function getDocument()
{
if ($this->node->ownerDocument === null) {
return null;
}
return new Document($this->node->ownerDocument);
}
/**
* Get the DOM document with the current element.
*
* @param string $encoding The document encoding
*
* @return Document
*/
public function toDocument($encoding = 'UTF-8')
{
$document = new Document(null, false, $encoding);
$document->appendChild($this->node);
return $document;
}
/**
* Convert the element to its string representation.
*
* @return string
*/
public function __toString()
{
return $this->html();
}
/**
* Searches for a node in the DOM tree for a given XPath expression or CSS selector.
*
* @param string $expression XPath expression or CSS selector
* @param string $type The type of the expression
* @param bool $wrapNode Returns array of Element if true, otherwise array of DOMElement
*
* @return Element[]|DOMElement[]
*
* @throws InvalidSelectorException
*
* @deprecated Not longer recommended, use Element::find() instead.
*/
public function __invoke($expression, $type = Query::TYPE_CSS, $wrapNode = true)
{
return $this->find($expression, $type, $wrapNode);
}
}
src/DiDom/ClassAttribute.php 0000644 00000014702 15050224513 0011772 0 ustar 00 isElementNode()) {
throw new InvalidArgumentException(sprintf('The element must contain DOMElement node'));
}
$this->element = $element;
$this->parseClassAttribute();
}
/**
* Parses class attribute of the element.
*/
protected function parseClassAttribute()
{
if ( ! $this->element->hasAttribute('class')) {
// possible if class attribute has been removed
if ($this->classesString !== '') {
$this->classesString = '';
$this->classes = [];
}
return;
}
// if class attribute is not changed
if ($this->element->getAttribute('class') === $this->classesString) {
return;
}
// save class attribute as is (without trimming)
$this->classesString = $this->element->getAttribute('class');
$classesString = trim($this->classesString);
if ($classesString === '') {
$this->classes = [];
return;
}
$classes = explode(' ', $classesString);
$classes = array_map('trim', $classes);
$classes = array_filter($classes);
$classes = array_unique($classes);
$this->classes = array_values($classes);
}
/**
* Updates class attribute of the element.
*/
protected function updateClassAttribute()
{
$this->classesString = implode(' ', $this->classes);
$this->element->setAttribute('class', $this->classesString);
}
/**
* @param string $className
*
* @return ClassAttribute
*
* @throws InvalidArgumentException if class name is not a string
*/
public function add($className)
{
if ( ! is_string($className)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, (is_object($className) ? get_class($className) : gettype($className))));
}
$this->parseClassAttribute();
if (in_array($className, $this->classes, true)) {
return $this;
}
$this->classes[] = $className;
$this->updateClassAttribute();
return $this;
}
/**
* @param array $classNames
*
* @return ClassAttribute
*
* @throws InvalidArgumentException if class name is not a string
*/
public function addMultiple(array $classNames)
{
$this->parseClassAttribute();
foreach ($classNames as $className) {
if ( ! is_string($className)) {
throw new InvalidArgumentException(sprintf('Class name must be a string, %s given', (is_object($className) ? get_class($className) : gettype($className))));
}
if (in_array($className, $this->classes, true)) {
continue;
}
$this->classes[] = $className;
}
$this->updateClassAttribute();
return $this;
}
/**
* @return string[]
*/
public function getAll()
{
$this->parseClassAttribute();
return $this->classes;
}
/**
* @param string $className
*
* @return bool
*/
public function contains($className)
{
if ( ! is_string($className)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, (is_object($className) ? get_class($className) : gettype($className))));
}
$this->parseClassAttribute();
return in_array($className, $this->classes, true);
}
/**
* @param string $className
*
* @return ClassAttribute
*
* @throws InvalidArgumentException if class name is not a string
*/
public function remove($className)
{
if ( ! is_string($className)) {
throw new InvalidArgumentException(sprintf('%s expects parameter 1 to be string, %s given', __METHOD__, (is_object($className) ? get_class($className) : gettype($className))));
}
$this->parseClassAttribute();
$classIndex = array_search($className, $this->classes);
if ($classIndex === false) {
return $this;
}
unset($this->classes[$classIndex]);
$this->updateClassAttribute();
return $this;
}
/**
* @param array $classNames
*
* @return ClassAttribute
*
* @throws InvalidArgumentException if class name is not a string
*/
public function removeMultiple(array $classNames)
{
$this->parseClassAttribute();
foreach ($classNames as $className) {
if ( ! is_string($className)) {
throw new InvalidArgumentException(sprintf('Class name must be a string, %s given', (is_object($className) ? get_class($className) : gettype($className))));
}
$classIndex = array_search($className, $this->classes);
if ($classIndex === false) {
continue;
}
unset($this->classes[$classIndex]);
}
$this->updateClassAttribute();
return $this;
}
/**
* @param string[] $exclusions
*
* @return ClassAttribute
*/
public function removeAll(array $exclusions = [])
{
$this->parseClassAttribute();
$preservedClasses = [];
foreach ($exclusions as $className) {
if ( ! is_string($className)) {
throw new InvalidArgumentException(sprintf('Class name must be a string, %s given', (is_object($className) ? get_class($className) : gettype($className))));
}
if ( ! in_array($className, $this->classes, true)) {
continue;
}
$preservedClasses[] = $className;
}
$this->classes = $preservedClasses;
$this->updateClassAttribute();
return $this;
}
/**
* @return Element
*/
public function getElement()
{
return $this->element;
}
}
src/DiDom/Query.php 0000644 00000043432 15050224513 0010150 0 ustar 00 ') {
$prefix = '/';
$selector = ltrim($selector, '> ');
}
$segments = self::getSegments($selector);
$xpath = '';
while (count($segments) > 0) {
$xpath .= self::buildXpath($segments, $prefix);
$selector = trim(substr($selector, strlen($segments['selector'])));
$prefix = isset($segments['rel']) ? '/' : '//';
if ($selector === '' || substr($selector, 0, 2) === '::' || substr($selector, 0, 1) === ',') {
break;
}
$segments = self::getSegments($selector);
}
// if selector has property
if (substr($selector, 0, 2) === '::') {
$property = self::parseProperty($selector);
$propertyXpath = self::convertProperty($property['name'], $property['args']);
$selector = substr($selector, strlen($property['property']));
$selector = trim($selector);
$xpath .= '/' . $propertyXpath;
}
return [$xpath, $selector];
}
/**
* @param string $selector
*
* @return array
*
* @throws InvalidSelectorException
*/
protected static function parseProperty($selector)
{
$name = '(?P[\w\-]+)';
$args = '(?:\((?P[^\)]+)?\))?';
$regexp = '/^::' . $name . $args . '/is';
if (preg_match($regexp, $selector, $matches) !== 1) {
throw new InvalidSelectorException(sprintf('Invalid property "%s"', $selector));
}
$result = [];
$result['property'] = $matches[0];
$result['name'] = $matches['name'];
$result['args'] = isset($matches['args']) ? explode(',', $matches['args']) : [];
$result['args'] = array_map('trim', $result['args']);
return $result;
}
/**
* @param string $name
* @param array $parameters
*
* @return string
*
* @throws InvalidSelectorException if the specified property is unknown
*/
protected static function convertProperty($name, array $parameters = [])
{
if ($name === 'text') {
return 'text()';
}
if ($name === 'attr') {
if (count($parameters) === 0) {
return '@*';
}
$attributes = [];
foreach ($parameters as $attribute) {
$attributes[] = sprintf('name() = "%s"', $attribute);
}
return sprintf('@*[%s]', implode(' or ', $attributes));
}
throw new InvalidSelectorException(sprintf('Unknown property "%s"', $name));
}
/**
* Converts a CSS pseudo-class into an XPath expression.
*
* @param string $pseudo Pseudo-class
* @param string $tagName
* @param array $parameters
*
* @return string
*
* @throws InvalidSelectorException if the specified pseudo-class is unknown
*/
protected static function convertPseudo($pseudo, &$tagName, array $parameters = [])
{
switch ($pseudo) {
case 'first-child':
return 'position() = 1';
case 'last-child':
return 'position() = last()';
case 'nth-child':
$xpath = sprintf('(name()="%s") and (%s)', $tagName, self::convertNthExpression($parameters[0]));
$tagName = '*';
return $xpath;
case 'contains':
$string = trim($parameters[0], '\'"');
if (count($parameters) === 1) {
return self::convertContains($string);
}
if ($parameters[1] !== 'true' && $parameters[1] !== 'false') {
throw new InvalidSelectorException(sprintf('Parameter 2 of "contains" pseudo-class must be equal true or false, "%s" given', $parameters[1]));
}
$caseSensitive = $parameters[1] === 'true';
if (count($parameters) === 2) {
return self::convertContains($string, $caseSensitive);
}
if ($parameters[2] !== 'true' && $parameters[2] !== 'false') {
throw new InvalidSelectorException(sprintf('Parameter 3 of "contains" pseudo-class must be equal true or false, "%s" given', $parameters[2]));
}
$fullMatch = $parameters[2] === 'true';
return self::convertContains($string, $caseSensitive, $fullMatch);
case 'has':
return self::cssToXpath($parameters[0], './/');
case 'not':
return sprintf('not(self::%s)', self::cssToXpath($parameters[0], ''));
case 'nth-of-type':
return self::convertNthExpression($parameters[0]);
case 'empty':
return 'count(descendant::*) = 0';
case 'not-empty':
return 'count(descendant::*) > 0';
}
throw new InvalidSelectorException(sprintf('Unknown pseudo-class "%s"', $pseudo));
}
/**
* @param array $segments
* @param string $prefix Specifies the nesting of nodes
*
* @return string XPath expression
*
* @throws InvalidArgumentException if you neither specify tag name nor attributes
*/
public static function buildXpath(array $segments, $prefix = '//')
{
$tagName = isset($segments['tag']) ? $segments['tag'] : '*';
$attributes = [];
// if the id attribute specified
if (isset($segments['id'])) {
$attributes[] = sprintf('@id="%s"', $segments['id']);
}
// if the class attribute specified
if (isset($segments['classes'])) {
foreach ($segments['classes'] as $class) {
$attributes[] = sprintf('contains(concat(" ", normalize-space(@class), " "), " %s ")', $class);
}
}
// if the attributes specified
if (isset($segments['attributes'])) {
foreach ($segments['attributes'] as $name => $value) {
$attributes[] = self::convertAttribute($name, $value);
}
}
// if the pseudo class specified
if (array_key_exists('pseudo', $segments)) {
foreach ($segments['pseudo'] as $pseudo) {
$expression = $pseudo['expression'] !== null ? $pseudo['expression'] : '';
$parameters = explode(',', $expression);
$parameters = array_map('trim', $parameters);
$attributes[] = self::convertPseudo($pseudo['type'], $tagName, $parameters);
}
}
if (count($attributes) === 0 && ! isset($segments['tag'])) {
throw new InvalidArgumentException('The array of segments must contain the name of the tag or at least one attribute');
}
$xpath = $prefix . $tagName;
if ($count = count($attributes)) {
$xpath .= ($count > 1) ? sprintf('[(%s)]', implode(') and (', $attributes)) : sprintf('[%s]', $attributes[0]);
}
return $xpath;
}
/**
* @param string $name The name of an attribute
* @param string $value The value of an attribute
*
* @return string
*/
protected static function convertAttribute($name, $value)
{
$isSimpleSelector = ! in_array(substr($name, 0, 1), ['^', '!'], true);
$isSimpleSelector = $isSimpleSelector && ( ! in_array(substr($name, -1), ['^', '$', '*', '!', '~'], true));
if ($isSimpleSelector) {
// if specified only the attribute name
$xpath = $value === null ? '@' . $name : sprintf('@%s="%s"', $name, $value);
return $xpath;
}
// if the attribute name starts with ^
// example: *[^data-]
if (substr($name, 0, 1) === '^') {
$xpath = sprintf('@*[starts-with(name(), "%s")]', substr($name, 1));
return $value === null ? $xpath : sprintf('%s="%s"', $xpath, $value);
}
// if the attribute name starts with !
// example: input[!disabled]
if (substr($name, 0, 1) === '!') {
$xpath = sprintf('not(@%s)', substr($name, 1));
return $xpath;
}
$symbol = substr($name, -1);
$name = substr($name, 0, -1);
switch ($symbol) {
case '^':
$xpath = sprintf('starts-with(@%s, "%s")', $name, $value);
break;
case '$':
$xpath = sprintf('substring(@%s, string-length(@%s) - string-length("%s") + 1) = "%s"', $name, $name, $value, $value);
break;
case '*':
$xpath = sprintf('contains(@%s, "%s")', $name, $value);
break;
case '!':
$xpath = sprintf('not(@%s="%s")', $name, $value);
break;
case '~':
$xpath = sprintf('contains(concat(" ", normalize-space(@%s), " "), " %s ")', $name, $value);
break;
}
return $xpath;
}
/**
* Converts nth-expression into an XPath expression.
*
* @param string $expression nth-expression
*
* @return string
*
* @throws InvalidSelectorException if the given nth-child expression is empty or invalid
*/
protected static function convertNthExpression($expression)
{
if ($expression === '') {
throw new InvalidSelectorException('nth-child (or nth-last-child) expression must not be empty');
}
if ($expression === 'odd') {
return 'position() mod 2 = 1 and position() >= 1';
}
if ($expression === 'even') {
return 'position() mod 2 = 0 and position() >= 0';
}
if (is_numeric($expression)) {
return sprintf('position() = %d', $expression);
}
if (preg_match("/^(?P[0-9]?n)(?:(?P\+|\-)(?P[0-9]+))?$/is", $expression, $segments)) {
if (isset($segments['mul'])) {
$multiplier = $segments['mul'] === 'n' ? 1 : trim($segments['mul'], 'n');
$sign = (isset($segments['sign']) && $segments['sign'] === '+') ? '-' : '+';
$position = isset($segments['pos']) ? $segments['pos'] : 0;
return sprintf('(position() %s %d) mod %d = 0 and position() >= %d', $sign, $position, $multiplier, $position);
}
}
throw new InvalidSelectorException(sprintf('Invalid nth-child expression "%s"', $expression));
}
/**
* @param string $string
* @param bool $caseSensitive
* @param bool $fullMatch
*
* @return string
*/
protected static function convertContains($string, $caseSensitive = true, $fullMatch = false)
{
if ($caseSensitive && $fullMatch) {
return sprintf('text() = "%s"', $string);
}
if ($caseSensitive && ! $fullMatch) {
return sprintf('contains(text(), "%s")', $string);
}
$strToLowerFunction = function_exists('mb_strtolower') ? 'mb_strtolower' : 'strtolower';
if ( ! $caseSensitive && $fullMatch) {
return sprintf("php:functionString(\"{$strToLowerFunction}\", .) = php:functionString(\"{$strToLowerFunction}\", \"%s\")", $string);
}
// if ! $caseSensitive and ! $fullMatch
return sprintf("contains(php:functionString(\"{$strToLowerFunction}\", .), php:functionString(\"{$strToLowerFunction}\", \"%s\"))", $string);
}
/**
* Splits the CSS selector into parts (tag name, ID, classes, attributes, pseudo-class).
*
* @param string $selector CSS selector
*
* @return array
*
* @throws InvalidSelectorException if the selector is empty or not valid
*/
public static function getSegments($selector)
{
$selector = trim($selector);
if ($selector === '') {
throw new InvalidSelectorException('The selector must not be empty.');
}
$pregMatchResult = preg_match(self::getSelectorRegex(), $selector, $segments);
if ($pregMatchResult === false || $pregMatchResult === 0 || $segments[0] === '') {
throw new InvalidSelectorException(sprintf('Invalid selector "%s".', $selector));
}
$result = ['selector' => $segments[0]];
if (isset($segments['tag']) && $segments['tag'] !== '') {
$result['tag'] = $segments['tag'];
}
// if the id attribute specified
if (isset($segments['id']) && $segments['id'] !== '') {
$result['id'] = $segments['id'];
}
// if the attributes specified
if (isset($segments['attrs'])) {
$attributes = trim($segments['attrs'], '[]');
$attributes = explode('][', $attributes);
foreach ($attributes as $attribute) {
if ($attribute !== '') {
list($name, $value) = array_pad(explode('=', $attribute, 2), 2, null);
if ($name === '') {
throw new InvalidSelectorException(sprintf('Invalid selector "%s": attribute name must not be empty', $selector));
}
// equal null if specified only the attribute name
$result['attributes'][$name] = is_string($value) ? trim($value, '\'"') : null;
}
}
}
// if the class attribute specified
if (isset($segments['classes'])) {
$classes = trim($segments['classes'], '.');
$classes = explode('.', $classes);
foreach ($classes as $class) {
if ($class !== '') {
$result['classes'][] = $class;
}
}
}
// if the pseudo class specified
if (isset($segments['pseudo']) && $segments['pseudo'] !== '') {
preg_match_all('/:(?P[\w\-]+)(?:\((?P[^\)]+)\))?/', $segments['pseudo'], $pseudoClasses);
$result['pseudo'] = [];
foreach ($pseudoClasses['type'] as $index => $pseudoType) {
$result['pseudo'][] = [
'type' => $pseudoType,
'expression' => $pseudoClasses['expr'][$index] !== '' ? $pseudoClasses['expr'][$index] : null,
];
}
}
// if it is a direct descendant
if (isset($segments['rel'])) {
$result['rel'] = $segments['rel'];
}
return $result;
}
private static function getSelectorRegex()
{
$tag = '(?P[\*|\w|\-]+)?';
$id = '(?:#(?P[\w|\-]+))?';
$classes = '(?P\.[\w|\-|\.]+)*';
$attrs = '(?P(?:\[.+?\])*)?';
$pseudoType = '[\w\-]+';
$pseudoExpr = '(?:\([^\)]+\))?';
$pseudo = '(?P(?::' . $pseudoType . $pseudoExpr . ')+)?';
$rel = '\s*(?P>)?';
return '/' . $tag . $id . $classes . $attrs . $pseudo . $rel . '/is';
}
/**
* @return array
*/
public static function getCompiled()
{
return static::$compiled;
}
/**
* @param array $compiled
*
* @throws InvalidArgumentException if the attributes is not an array
*/
public static function setCompiled(array $compiled)
{
static::$compiled = $compiled;
}
}
composer.json 0000644 00000001444 15050224513 0007266 0 ustar 00 {
"name": "imangazaliev/didom",
"description": "Simple and fast HTML parser",
"type": "library",
"keywords": ["didom", "parser", "html", "xml"],
"license": "MIT",
"homepage": "https://github.com/Imangazaliev/DiDOM",
"authors": [
{
"name": "Imangazaliev Muhammad",
"email": "imangazalievm@gmail.com"
}
],
"require": {
"php": ">=5.4",
"ext-dom": "*",
"ext-iconv": "*"
},
"require-dev": {
"phpunit/phpunit": "^4.8"
},
"autoload": {
"psr-4": {
"DiDom\\": "src/DiDom/"
}
},
"autoload-dev": {
"psr-4": {
"DiDom\\Tests\\": "tests/"
}
},
"config": {
"platform": {
"php": "5.4"
}
}
}
LICENSE 0000644 00000002050 15050224513 0005543 0 ustar 00 Copyright (c) 2015 Muhammad Imangazaliev
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is furnished
to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.