rspamd_html
This module provides different methods to access HTML tags. To get HTML context
from an HTML part you could use method part:get_html()
Methods:
Method | Description |
---|---|
html:has_tag(name) |
Checks if a specified tag name is presented in a part. |
html:check_property(name) |
Checks if the HTML has a specific property. |
html:get_images() |
Returns a table of images found in html. |
html:foreach_tag(tagname, callback) |
Processes HTML tree calling the specified callback for each tag of the specified. |
html:get_invisible() |
Returns invisible content of the HTML data. |
html_tag:get_type() |
Returns string representation of HTML type for a tag. |
html_tag:get_extra() |
Returns extra data associated with the tag. |
html_tag:get_parent() |
Returns parent node for a specified tag. |
html_tag:get_flags() |
Returns flags a specified tag. |
html_tag:get_content() |
Returns content of tag (approximate for some cases). |
html_tag:get_content_length() |
Returns length of a tag’s content. |
html_tag:get_style() |
Returns style calculated for the element. |
html_tag:get_attribute(name) |
Returns value of attribute for the element. |
The module rspamd_html
defines the following methods.
html:has_tag(name)
Checks if a specified tag name
is presented in a part
Parameters:
name {string}
: name of tag to checkReturns:
{boolean}
: true
if the tag exists in HTML treeBack to module description.
html:check_property(name)
Checks if the HTML has a specific property. Here is the list of available properties:
no_html
- no html tag presentedbad_element
- part has some broken elementsxml
- part is xhtmlunknown_element
- part has some unknown elementsduplicate_element
- part has some duplicate elements that should be unique (namely, title
tag)unbalanced
- part has unbalanced tagsParameters:
name {string}
: name of propertyReturns:
{boolean}
: true if the part has the specified propertyBack to module description.
html:get_images()
Returns a table of images found in html. Each image is, in turn, a table with the following fields:
src
- link to the sourceheight
- height in pixelswidth
- width in pixelsembedded
- true
if an image is embedded in a messageParameters:
No parameters
Returns:
{table}
: table of images in html partBack to module description.
html:foreach_tag(tagname, callback)
Processes HTML tree calling the specified callback for each tag of the specified type.
Callback is called with the following attributes:
tag
: html tag structurecontent_length
: length of content within a tagCallback function should return true
to stop processing and false
to continue
Parameters:
No parameters
Returns:
Back to module description.
html:get_invisible()
Returns invisible content of the HTML data
Parameters:
No parameters
Returns:
Back to module description.
html_tag:get_type()
Returns string representation of HTML type for a tag
Parameters:
No parameters
Returns:
{string}
: type of tagBack to module description.
html_tag:get_extra()
Returns extra data associated with the tag
Parameters:
No parameters
Returns:
{url|image|nil}
: extra data associated with the tagBack to module description.
html_tag:get_parent()
Returns parent node for a specified tag
Parameters:
No parameters
Returns:
{html_tag}
: parent object for a specified tagBack to module description.
html_tag:get_flags()
Returns flags a specified tag:
closed
: tag is properly closedclosing
: tag is a closing tagbroken
: tag is somehow brokenunbalanced
: tag is unbalancedxml
: tag is xml tagParameters:
No parameters
Returns:
{table}
: table of flagsBack to module description.
html_tag:get_content()
Returns content of tag (approximate for some cases)
Parameters:
No parameters
Returns:
{rspamd_text}
: rspamd text with tag’s contentBack to module description.
html_tag:get_content_length()
Returns length of a tag’s content
Parameters:
No parameters
Returns:
{number}
: size of content enclosed within a tagBack to module description.
html_tag:get_style()
Returns style calculated for the element
Parameters:
No parameters
Returns:
{table}
: table associated with the styleBack to module description.
html_tag:get_attribute(name)
Returns value of attribute for the element
Refer to html_components_map
in src/libserver/html/html.cxx
for recognised names
Parameters:
No parameters
Returns:
{string|nil}
: value of the attributeBack to module description.
Back to top.