php-htmldiff is a library for comparing two HTML files/snippets and highlighting the differences using simple HTML.
This HTML Diff implementation was forked from rashid2538/php-htmldiff and has been modified with new features, bug fixes, and enhancements to the original code.
For more information on these modifications, read the differences from rashid2538/php-htmldiff or view the CHANGELOG.
The recommended way to install php-htmldiff is through Composer. Require the caxy/php-htmldiff package by running following command:
composer require caxy/php-htmldiff
This will resolve the latest stable version.
Otherwise, install the library and setup the autoloader yourself.
If you are using Symfony, you can use the caxy/HtmlDiffBundle to make life easy!
use Caxy\HtmlDiff\HtmlDiff;
$htmlDiff = new HtmlDiff($oldHtml, $newHtml);
$content = $htmlDiff->build();
The configuration for HtmlDiff is contained in the Caxy\HtmlDiff\HtmlDiffConfig
class.
There are two ways to set the configuration:
When a new HtmlDiff
object is created, it creates a HtmlDiffConfig
object with the default configuration.
You can change the configuration using setters on the object:
use Caxy\HtmlDiff\HtmlDiff;
// ...
$htmlDiff = new HtmlDiff($oldHtml, $newHtml);
// Set some of the configuration options.
$htmlDiff->getConfig()
->setMatchThreshold(80)
->setInsertSpaceInReplace(true)
;
// Calculate the differences using the configuration and get the html diff.
$content = $htmlDiff->build();
// ...
You can also set the configuration by creating an instance of
Caxy\HtmlDiff\HtmlDiffConfig
and using it when creating a new HtmlDiff
object using HtmlDiff::create
.
This is useful when creating more than one instance of HtmlDiff
:
use Caxy\HtmlDiff\HtmlDiff;
use Caxy\HtmlDiff\HtmlDiffConfig;
// ...
$config = new HtmlDiffConfig();
$config
->setMatchThreshold(95)
->setInsertSpaceInReplace(true)
;
// Create an HtmlDiff object with the custom configuration.
$firstHtmlDiff = HtmlDiff::create($oldHtml, $newHtml, $config);
$firstContent = $firstHtmlDiff->build();
$secondHtmlDiff = HtmlDiff::create($oldHtml2, $newHtml2, $config);
$secondHtmlDiff->getConfig()->setMatchThreshold(50);
$secondContent = $secondHtmlDiff->build();
// ...
$config = new HtmlDiffConfig();
$config
// Percentage required for list items to be considered a match.
->setMatchThreshold(80)
// Set the encoding of the text to be diffed.
->setEncoding('UTF-8')
// If true, a space will be added between the <del> and <ins> tags of text that was replaced.
->setInsertSpaceInReplace(false)
// Option to disable the new Table Diffing feature and treat tables as regular text.
->setUseTableDiffing(true)
// Pass an instance of \Doctrine\Common\Cache\Cache to cache the calculated diffs.
->setCacheProvider(null)
// Disable the HTML purifier (only do this if you known what you're doing)
// This bundle heavily relies on the purified input from ezyang/htmlpurifier
->setPurifierEnabled(true)
// Set the cache directory that HTMLPurifier should use.
->setPurifierCacheLocation(null)
// Group consecutive deletions and insertions instead of showing a deletion and insertion for each word individually.
->setGroupDiffs(true)
// List of characters to consider part of a single word when in the middle of text.
->setSpecialCaseChars(array('.', ',', '(', ')', '\''))
// List of tags to treat as special case tags.
->setSpecialCaseTags(array('strong', 'b', 'i', 'big', 'small', 'u', 'sub', 'sup', 'strike', 's', 'p'))
// List of tags (and their replacement strings) to be diffed in isolation.
->setIsolatedDiffTags(array(
'ol' => '[[REPLACE_ORDERED_LIST]]',
'ul' => '[[REPLACE_UNORDERED_LIST]]',
'sub' => '[[REPLACE_SUB_SCRIPT]]',
'sup' => '[[REPLACE_SUPER_SCRIPT]]',
'dl' => '[[REPLACE_DEFINITION_LIST]]',
'table' => '[[REPLACE_TABLE]]',
'strong' => '[[REPLACE_STRONG]]',
'b' => '[[REPLACE_B]]',
'em' => '[[REPLACE_EM]]',
'i' => '[[REPLACE_I]]',
'a' => '[[REPLACE_A]]',
))
// Sets whether newline characters are kept or removed when `$htmlDiff->build()` is called.
// For example, if your content includes <pre> tags, you might want to set this to true.
->setKeepNewLines(false)
;
See CONTRIBUTING file.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. See CODE_OF_CONDUCT file.
- SavageTiger for contributing many improvements and fixes to caxy/php-htmldiff!
- rashid2538 for the port to PHP and the base for our project: rashid2538/php-htmldiff
- willdurand for an excellent post on open sourcing libraries. Much of this documentation is based off of the examples in the post.
Did we miss anyone? If we did, let us know or put in a pull request!
php-htmldiff is available under GNU General Public License, version 2. See the LICENSE file for details.
- Tests, tests, and more tests! (mostly unit tests) - need more tests before we can major refactoring / cleanup for a v1 release
- Add documentation for setting up a cache provider (doctrine cache)
- Maybe add abstraction layer for cache + adapter for doctrine cache
- Make HTML Purifier an optional dependency - possibly use abstraction layer for purifiers so alternatives could be used (or none at all for performance)
- Expose configuration for HTML Purifier (used in table diffing) - currently only cache dir is configurable through HtmlDiffConfig object
- Performance improvements (we have 1 benchmark test, we should probably get more)
- Algorithm improvements - trimming alike text at start and ends, store nested diff results in memory to re-use (like we do w/ caching)
- Benchmark using DOMDocument vs. alternatives vs. string parsing
- Consider not using string parsing for HtmlDiff in order to avoid having to create many DOMDocument instances in ListDiff and TableDiff
- Benchmarking
- Refactoring (but... tests first)
- Overall design/architecture improvements
- API improvements so a new HtmlDiff isn't required for each new diff (especially so that configuration can be re-used)
- Split demo application to separate repository
- Add documentation on alternative htmldiff engines and perhaps some comparisons