DiDOM

DiDOM - simple and fast HTML parser.

Installation
Quick start
Creating new document
Search for elements
Verify if element exists
Supported selectors
Output
Creating a new element
Getting parent element
Getting sibling elements
Getting the child elements
Getting document
Working with element attributes
Comparing elements
Adding a child element
Replacing element
Removing element
Working with cache
Miscellaneous
Comparison with other parsers

Installation

To install DiDOM run the command:

composer require imangazaliev/didom

Quick start

use DiDom\Document;

$document = new Document('http://www.news.com/', true);

$posts = $document->find('.post');

foreach($posts as $post) {
    echo $post->text(), "\n";
}

Creating new document

DiDom allows to load HTML in several ways:

With constructor

// the first parameter is a string with HTML
$document = new Document($html);

// file path
$document = new Document('page.html', true);

// or URL
$document = new Document('http://www.example.com/', true);

The second parameter specifies if you need to load file. Default is false.

With separate methods

$document = new Document();

$document->loadHtml($html);

$document->loadHtmlFile('page.html');

$document->loadHtmlFile('http://www.example.com/');

There are two methods available for loading XML: loadXml and loadXmlFile.

These methods accept additional options:

$document->loadHtml($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

Search for elements

DiDOM accepts CSS selector or XPath as an expression for search. You need to path expression as the first parameter, and specify its type in the second one (default type is Query::TYPE_CSS):

With method `find()`:

use DiDom\Document;
use DiDom\Query;

...

// CSS selector
$posts = $document->find('.post');

// XPath
$posts = $document->find("//div[contains(@class, 'post')]", Query::TYPE_XPATH);

If the elements that match a given expression are found, then method returns an array of instances of DiDom\Element, otherwise - an empty array. You could also get an array of DOMElement objects. To get this, pass false as the third parameter.

With magic method `__invoke()`:

$posts = $document('.post');

With method `xpath()`:

$posts = $document->xpath("//*[contains(concat(' ', normalize-space(@class), ' '), ' post ')]");

You can do search inside an element:

echo $document->find('nav')[0]->first('ul.menu')->xpath('//li')[0]->text();

Verify if element exists

To verify if element exist use has() method:

if ($document->has('.post')) {
    // code
}

If you need to check if element exist and then get it:

if ($document->has('.post')) {
    $elements = $document->find('.post');
    // code
}

but it would be faster like this:

if (count($elements = $document->find('.post')) > 0) {
    // code
}

because in the first case it makes two requests.

Supported selectors

DiDom supports search by:

tag
class, ID, name and value of an attribute
pseudo-classes:
- first-, last-, nth-child
- empty and not-empty
- contains
- has

// all links
$document->find('a');

// any element with id = "foo" and "bar" class
$document->find('#foo.bar');

// any element with attribute "name"
$document->find('[name]');
// the same as
$document->find('*[name]');

// input field with the name "foo"
$document->find('input[name=foo]');
$document->find('input[name=\'bar\']');
$document->find('input[name="baz"]');

// any element that has an attribute starting with "data-" and the value "foo"
$document->find('*[^data-=foo]');

// all links starting with https
$document->find('a[href^=https]');

// all images with the extension png
$document->find('img[src$=png]');

// all links containing the string "example.com"
$document->find('a[href*=example.com]');

// text of the links with "foo" class
$document->find('a.foo::text');

// address and title of all the fields with "bar" class
$document->find('a.bar::attr(href|title)');

Output

Getting HTML

With method `html()`:

$posts = $document->find('.post');

echo $posts[0]->html();

Casting to string:

$html = (string) $posts[0];

Formatting HTML output

$html = $document->format()->html();

An element does not have format() method, so if you need to output formatted HTML of the element, then first you have to convert it to a document:

$html = $element->toDocument()->format()->html();

Inner HTML

$innerHtml = $element->innerHtml();

Document does not have the method innerHtml(), therefore, if you need to get inner HTML of a document, convert it into an element first:

$innerHtml = $document->toElement()->innerHtml();

Getting XML

echo $document->xml();

echo $document->first('book')->xml();

Getting content

$posts = $document->find('.post');

echo $posts[0]->text();

Creating a new element

Creating an instance of the class

use DiDom\Element;

$element = new Element('span', 'Hello');

// Outputs "<span>Hello</span>"
echo $element->html();

First parameter is a name of an attribute, the second one is its value (optional), the third one is element attributes (optional).

An example of creating an element with attributes:

$attributes = ['name' => 'description', 'placeholder' => 'Enter description of item'];

$element = new Element('textarea', 'Text', $attributes);

An element can be created from an instance of the class DOMElement:

use DiDom\Element;
use DOMElement;

$domElement = new DOMElement('span', 'Hello');

$element = new Element($domElement);

Using the method `createElement`

$document = new Document($html);

$element = $document->createElement('span', 'Hello');

Getting parent element

$document = new Document($html);

$input = $document->find('input[name=email]')[0];

var_dump($input->parent());

Getting sibling elements

$document = new Document($html);

$item = $document->find('ul.menu > li')[1];

var_dump($item->previousSibling());

var_dump($item->nextSibling());

Getting the child elements

$html = '
<ul>
    <li>Foo</li>
    <li>Bar</li>
    <li>Baz</li>
</ul>
';

$document = new Document($html);
$list = $document->first('ul');

// string(3) "Baz"
var_dump($item->child(2)->text());

// string(3) "Foo"
var_dump($item->firstChild()->text());

// string(3) "Baz"
var_dump($item->lastChild()->text());

// array(3) { ... }
var_dump($item->children());

Getting document

$document = new Document($html);

$element = $document->find('input[name=email]')[0];

$document2 = $element->getDocument();

// bool(true)
var_dump($document->is($document2));

Working with element attributes

Getting attribute name

$name = $element->tag;

Creating/updating an attribute

With method `setAttribute`:

$element->setAttribute('name', 'username');

With method `attr`:

$element->attr('name', 'username');

With magic method `__set`:

$element->name = 'username';

Getting value of an attribute

With method `getAttribute`:

$username = $element->getAttribute('value');

With method `attr`:

$username = $element->attr('value');

With magic method `__get`:

$username = $element->name;

Returns null if attribute is not found.

Verify if attribute exists

With method `hasAttribute`:

if ($element->hasAttribute('name')) {
    // code
}

With magic method `__isset`:

if (isset($element->name)) {
    // code
}

Removing attribute:

With method `removeAttribute`:

$element->removeAttribute('name');

With magic method `__unset`:

unset($element->name);

Comparing elements

$element  = new Element('span', 'hello');
$element2 = new Element('span', 'hello');

// bool(true)
var_dump($element->is($element));

// bool(false)
var_dump($element->is($element2));

Appending child elements

$list = new Element('ul');

$item = new Element('li', 'Item 1');

$list->appendChild($item);

$items = [
    new Element('li', 'Item 2'),
    new Element('li', 'Item 3'),
];

$list->appendChild($items);

Adding a child element

$list = new Element('ul');

$item = new Element('li', 'Item 1');
$items = [
    new Element('li', 'Item 2'),
    new Element('li', 'Item 3'),
];

$list->appendChild($item);
$list->appendChild($items);

Replacing element

$element = new Element('span', 'hello');

$document->find('.post')[0]->replace($element);

Removing element

$document->find('.post')[0]->remove();

Working with cache

Cache is an array of XPath expressions, that were converted from CSS.

Getting from cache

use DiDom\Query;

...

$xpath    = Query::compile('h2');
$compiled = Query::getCompiled();

// array('h2' => '//h2')
var_dump($compiled);

Installing cache

Query::setCompiled(['h2' => '//h2']);

Miscellaneous

`preserveWhiteSpace`

By default, whitespace preserving is disabled.

You can enable the preserveWhiteSpace option before loading the document:

$document = new Document();

$document->preserveWhiteSpace();

$document->loadXml($xml);

`count`

The count () method counts children that match the selector:

// print the number of links in the document
echo $document->count('a');

`matches`

Returns true if the node matches the selector:

$element->matches('div#content');

// strict match
// returns true if the element is a div with id equals content and nothing else
// if the element has any other attributes the method returns false
$element->matches('div#content', true);

`isTextNode`

Checks whether an element is a text node (DOMText):

$element->isTextNode();

`isCommentNode`

Checks whether the element is a comment (DOMComment):

$element->isCommentNode();

Comparison with other parsers

Name		Name	Last commit message	Last commit date
Latest commit History 212 Commits
src/DiDom		src/DiDom
tests		tests
.gitignore		.gitignore
.php_cs		.php_cs
.travis.yml		.travis.yml
LICENSE		LICENSE
README-RU.md		README-RU.md
README.md		README.md
composer.json		composer.json
phpunit.xml		phpunit.xml

License

maxnamillion1/DiDOM

Folders and files

Latest commit

History

Repository files navigation

DiDOM

Contents

Installation

Quick start

Creating new document

With constructor

With separate methods

Search for elements

With method find():

With magic method __invoke():

With method xpath():

Verify if element exists

Supported selectors

Output

Getting HTML

With method html():

Casting to string:

Formatting HTML output

Inner HTML

Getting XML

Getting content

Creating a new element

Creating an instance of the class

Using the method createElement

Getting parent element

Getting sibling elements

Getting the child elements

Getting document

Working with element attributes

Getting attribute name

Creating/updating an attribute

With method setAttribute:

With method attr:

With magic method __set:

Getting value of an attribute

With method getAttribute:

With method attr:

With magic method __get:

Verify if attribute exists

With method hasAttribute:

With magic method __isset:

Removing attribute:

With method removeAttribute:

With magic method __unset:

Comparing elements

Appending child elements

Adding a child element

Replacing element

Removing element

Working with cache

Getting from cache

Installing cache

Miscellaneous

preserveWhiteSpace

count

matches

isTextNode

isCommentNode

Comparison with other parsers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

With method `find()`:

With magic method `__invoke()`:

With method `xpath()`:

With method `html()`:

Using the method `createElement`

With method `setAttribute`:

With method `attr`:

With magic method `__set`:

With method `getAttribute`:

With method `attr`:

With magic method `__get`:

With method `hasAttribute`:

With magic method `__isset`:

With method `removeAttribute`:

With magic method `__unset`:

`preserveWhiteSpace`

`count`

`matches`

`isTextNode`

`isCommentNode`

Packages