voilab/htmlcleaner 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

voilab/htmlcleaner

Composer 安装命令:

composer require voilab/htmlcleaner

包简介

A HTML cleaner based on SimpleXML, fast and customizable

README 文档

README

A HTML cleaner based on SimpleXML, fast and customizable

Install

Via Composer

Create a composer.json file in your project root:

{
    "require": {
        "voilab/htmlcleaner": "0.*"
    }
}
$ composer require voilab/htmlcleaner

Sample dataset

<p>
    Some paragraph with <strong>bold</strong> or
    <em><u><i>nested tags</i></u></em>.
</p>
<p>
    And a second paragraph (so two roots elements, here) with
    <a href="somesite.org">a cool link</a>,
    <a href="javascript:alert('BAM!');">a bad link</a>
    and some <span class="red">nice attributes to try to keep</span>.
</p>

Basic usage

All tags stripped

use \voilab\cleaner\HtmlCleaner;

$cleaner = new HtmlCleaner();
$raw_html = '...'; // take sample dataset above

echo $cleaner->clean($raw_html);

Allow some tags

// create cleaner...
$cleaner->addAllowedTags(['p', 'strong']);
// call clean method

Allow some tags and attributes (regardless of tags)

// create cleaner...
$cleaner
    ->addAllowedTags(['p', 'span'])
    ->addAllowedAttributes(['class']);
// call clean method

Allow some attributes only on certain tags

// create cleaner...
$cleaner
    ->addAllowedTags(['p', 'span'])
    ->addAllowedAttributes([
        // keep attribute "class" only for spans
        new \voilab\cleaner\attribute\Keep('class', 'span'),

        // you can use this shorthand too, as a string
        'style:span'
    ]);
// call clean method

Advanced usage

Processors

Processors are used to prepare HTML string before it is inserted into a new SimpleXMLElement (base of the process). They are also used to format the HTML after it is cleaned. It's some sort of pre-process and post-process.

The pre-process must remove not allowed tags.

Standard processor

The standard processor uses strip_tags() to remove not allowed tags. After process, the processor removes all carriage returns from the string.

Custom processor

You can create your own processor by implementing \voilab\cleaner\processor\Processor. Do not forget that the pre-process is responsible of removing all not allowed tags.

Attributes

Attributes classes are used to validate attributes and their content. By default an allowed attribute becomes a \voilab\cleaner\attribute\Keep. Every "not allowed" attribute becomes a \voilab\cleaner\attribute\Remove.

These two attribute types don't need to be instanciated by you. All attributes provided as a string in setAllowedTags() are converted in Keep class.

Js attribute

You may want to keep some attributes but check the content. It's true for the href attribute. It can contain a valid URL or some javascript injection. There is an attribute validator already created for that:

$cleaner
    ->addAllowedTags(['a'])
    ->addAllowedAttributes([
        new \voilab\cleaner\attribute\Js('href')
    ]);

Note that allowed attributes can be bound or not to a specific tag. In the example above, the href attribute will be valid for every HTML tag. If you want to bind the attribute to a tag, you need to specify it as a second parameter.

Known limitations

Root mixed content

Mixed content outside tags is not allowed in root position.

<!-- not valid: parts "some root " and " special " will disappear -->
some root <strong>mixed</strong> special <em>content</em>

<!-- valid -->
<p>some root <strong>mixed</strong> special <em>content</em></p>
<!-- also valid -->
<p>some root element</p>
<p>and an other root element</p>

Bad HTML format with Standard processor

If HTML is not well formatted, the cleaner will throw an \Exception. The string needs to be perfectly written, because it is processed by simplexml_load_string($html), which is very strict:

  • tags must be closed (<p></p> or <br />)
  • attributes must be wrapped in (double-)quotes (<hr class="test" />)
  • (double-)quote is not allowed in attribute content, it must be converted in &quot; before HtmlCleaner::clean() is called
  • opening tag < and & are not allowed in content, they must be converted respectivly in &lt; and &amp; before HtmlCleaner::clean() is called

These limitations will eventually be addressed in future releases.

Testing

$ vendor/bin/phpunit --bootstrap vendor/autoload.php tests/

Security

If you discover any security related issues, please use the issue tracker.

Credits

License

The MIT License (MIT). Please see License File for more information.

统计信息

  • 总下载量: 139
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 3
  • 点击次数: 0
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 3
  • Watchers: 3
  • Forks: 0
  • 开发语言: PHP

其他信息

  • 授权协议: MIT
  • 更新时间: 2017-04-03

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固