rspamd_textpart
This module provides different methods to manipulate text parts data. Text parts
could be obtained from the rspamd_task
by using of method task:get_text_parts()
Methods:
Method | Description |
---|---|
text_part:is_utf() |
Return TRUE if part is a valid utf text. |
text_part:has_8bit_raw() |
Return TRUE if a part has raw 8bit characters. |
text_part:has_8bit() |
Return TRUE if a part has raw 8bit characters. |
text_part:get_content([type]) |
Get the text of the part (html tags stripped). |
text_part:get_raw_content() |
Get the original text of the part. |
text_part:get_content_oneline() |
Get the text of the part (html tags and newlines stripped). |
text_part:get_length() |
Get length of the text of the part. |
mime_part:get_raw_length() |
Get length of the raw content of the part (e.g. |
mime_part:get_urls_length() |
Get length of the urls within the part. |
mime_part:get_lines_count() |
Get lines number in the part. |
mime_part:get_stats() |
Returns a table with the following data. |
mime_part:get_words_count() |
Get words number in the part. |
mime_part:get_words([how]) |
Get words in the part. |
mime_part:filter_words(regexp, [how][, max]]) |
Filter words using some regexp. |
text_part:is_empty() |
Returns true if the specified part is empty. |
text_part:is_html() |
Returns true if the specified part has HTML content. |
text_part:get_html() |
Returns html content of the specified part. |
text_part:get_language() |
Returns the code of the most used unicode script in the text part. |
text_part:get_charset() |
Returns part real charset. |
text_part:get_languages() |
Returns array of tables of all languages detected for a part. |
text_part:get_fuzzy_hashes(mempool) |
Returns direct hash of textpart as a string and array [1..32] of shingles each represented as a following table. |
text_part:get_mimepart() |
Returns the mime part object corresponding to this text part. |
The module rspamd_textpart
defines the following methods.
text_part:is_utf()
Return TRUE if part is a valid utf text
Parameters:
No parameters
Returns:
{boolean}
: true if part is valid UTF8
partBack to module description.
text_part:has_8bit_raw()
Return TRUE if a part has raw 8bit characters
Parameters:
No parameters
Returns:
{boolean}
: true if a part has raw 8bit charactersBack to module description.
text_part:has_8bit()
Return TRUE if a part has raw 8bit characters
Parameters:
No parameters
Returns:
{boolean}
: true if a part has encoded 8bit charactersBack to module description.
text_part:get_content([type])
Get the text of the part (html tags stripped). Optional type
defines type of content to get:
content
(default): utf8 content with HTML tags stripped and newlines preservedcontent_oneline
: utf8 content with HTML tags and newlines strippedraw
: raw content, not mime decoded nor utf8 convertedraw_parsed
: raw content, mime decoded, not utf8 convertedraw_utf
: raw content, mime decoded, utf8 converted (but with HTML tags and newlines)Parameters:
No parameters
Returns:
{text}
: UTF8
encoded content of the part (zero-copy if not converted to a lua string)Back to module description.
text_part:get_raw_content()
Get the original text of the part
Parameters:
No parameters
Returns:
{text}
: UTF8
encoded content of the part (zero-copy if not converted to a lua string)Back to module description.
text_part:get_content_oneline()
Get the text of the part (html tags and newlines stripped)
Parameters:
No parameters
Returns:
{text}
: UTF8
encoded content of the part (zero-copy if not converted to a lua string)Back to module description.
text_part:get_length()
Get length of the text of the part
Parameters:
No parameters
Returns:
{integer}
: length of part in bytesBack to module description.
mime_part:get_raw_length()
Get length of the raw content of the part (e.g. HTML with tags unstripped)
Parameters:
No parameters
Returns:
{integer}
: length of part in bytesBack to module description.
mime_part:get_urls_length()
Get length of the urls within the part
Parameters:
No parameters
Returns:
{integer}
: length of urls in bytesBack to module description.
mime_part:get_lines_count()
Get lines number in the part
Parameters:
No parameters
Returns:
{integer}
: number of lines in the partBack to module description.
mime_part:get_stats()
Returns a table with the following data:
lines
: number of linesspaces
: number of spacesdouble_spaces
: double spacesempty_lines
: number of empty linesnon_ascii_characters
: number of non ascii charactersascii_characters
: number of ascii charactersParameters:
No parameters
Returns:
{table}
: table of statsBack to module description.
mime_part:get_words_count()
Get words number in the part
Parameters:
No parameters
Returns:
{integer}
: number of words in the partBack to module description.
mime_part:get_words([how])
Get words in the part. Optional how
argument defines type of words returned:
stem
: stemmed words (default)norm
: normalised words (utf normalised + lowercased)raw
: raw words in utf (if possible)full
: list of tables, each table has the following fields:
Parameters:
No parameters
Returns:
{table/strings}
: words in the partBack to module description.
mime_part:filter_words(regexp, [how][, max]])
Filter words using some regexp:
stem
: stemmed words (default)norm
: normalised words (utf normalised + lowercased)raw
: raw words in utf (if possible)full
: list of tables, each table has the following fields:
Parameters:
regexp {rspamd_regexp}
: regexp to matchhow {string}
: what words to extractmax {number}
: maximum number of hits returned (all hits if <= 0 or nil)Returns:
{table/strings}
: words matching regexpBack to module description.
text_part:is_empty()
Returns true
if the specified part is empty
Parameters:
No parameters
Returns:
{bool}
: whether a part is emptyBack to module description.
text_part:is_html()
Returns true
if the specified part has HTML content
Parameters:
No parameters
Returns:
{bool}
: whether a part is HTML partBack to module description.
text_part:get_html()
Returns html content of the specified part
Parameters:
No parameters
Returns:
{html}
: html contentBack to module description.
text_part:get_language()
Returns the code of the most used unicode script in the text part. Does not work with raw parts
Parameters:
No parameters
Returns:
{string}
: short abbreviation (such as ru
) for the script’s languageBack to module description.
text_part:get_charset()
Returns part real charset
Parameters:
No parameters
Returns:
{string}
: charset of the partBack to module description.
text_part:get_languages()
Returns array of tables of all languages detected for a part:
Parameters:
No parameters
Returns:
{array|tables}
: all languages detected for the partBack to module description.
text_part:get_fuzzy_hashes(mempool)
Returns direct hash of textpart as a string and array [1..32] of shingles each represented as a following table:
Parameters:
mempool {rspamd_mempool}
: - memory pool (usually task pool)Returns:
{string,array|tables}
: fuzzy hashes calculatedBack to module description.
text_part:get_mimepart()
Returns the mime part object corresponding to this text part
Parameters:
No parameters
Returns:
{mimepart}
: mimepart objectBack to module description.
Back to top.
rspamd_mimepart
This module provides access to mime parts found in a message
Methods:
Method | Description |
---|---|
mime_part:get_header(name[, case_sensitive]) |
Get decoded value of a header specified with optional case_sensitive flag. |
mime_part:get_header_raw(name[, case_sensitive]) |
Get raw value of a header specified with optional case_sensitive flag. |
mime_part:get_header_full(name[, case_sensitive]) |
Get raw value of a header specified with optional case_sensitive flag. |
mimepart:get_header_count(name[, case_sensitive]) |
Lightweight version if you need just a header’s count. |
mimepart:get_raw_headers() |
Get all undecoded headers of a mime part as a string. |
mimepart:get_headers() |
Get all undecoded headers of a mime part as a string. |
mime_part:get_content() |
Get the parsed content of part. |
mime_part:get_raw_content() |
Get the raw content of part. |
mime_part:get_length() |
Get length of the content of the part. |
mime_part:get_type() |
Extract content-type string of the mime part. |
mime_part:get_type_full() |
Extract content-type string of the mime part with all attributes. |
mime_part:get_detected_type() |
Extract content-type string of the mime part. |
mime_part:get_detected_type_full() |
Extract content-type string of the mime part with all attributes. |
mime_part:get_detected_ext() |
Returns a msdos extension name according to lua_magic detection. |
mime_part:get_cte() |
Extract content-transfer-encoding for a part. |
mime_part:get_filename() |
Extract filename associated with mime part if it is an attachment. |
mime_part:is_image() |
Returns true if mime part is an image. |
mime_part:get_image() |
Returns rspamd_image structure associated with this part. |
mime_part:is_archive() |
Returns true if mime part is an archive. |
mime_part:is_attachment() |
Returns true if mime part looks like an attachment. |
mime_part:get_archive() |
Returns rspamd_archive structure associated with this part. |
mime_part:is_multipart() |
Returns true if mime part is a multipart part. |
mime_part:is_message() |
Returns true if mime part is a message part (message/rfc822). |
mime_part:get_boundary() |
Returns boundary for a part (extracted from parent multipart for normal parts and. |
mime_part:get_enclosing_boundary() |
Returns an enclosing boundary for a part even for multiparts. |
mime_part:get_children() |
Returns rspamd_mimepart table of part’s childer. |
mime_part:is_text() |
Returns true if mime part is a text part. |
mime_part:get_text() |
Returns rspamd_textpart structure associated with this part. |
mime_part:get_digest() |
Returns the unique digest for this mime part. |
mime_part:get_id() |
Returns the order of the part in parts list. |
mime_part:is_broken() |
Returns true if mime part has incorrectly specified content type. |
mime_part:headers_foreach(callback, [params]) |
This method calls callback for each header that satisfies some condition. |
mime_part:get_parent() |
Returns parent part for this part. |
mime_part:get_specific() |
Returns specific lua content for this part. |
mime_part:set_specific(<any>) |
Sets a specific content for this part. |
mime_part:is_specific(<any>) |
Returns true if part has specific lua content. |
mime_part:get_urls([need_emails|list_protos][, need_images]) |
Get all URLs found in a mime part. |
mime_part:get_stats() |
Returns a table with the following data. |
The module rspamd_mimepart
defines the following methods.
mime_part:get_header(name[, case_sensitive])
Get decoded value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter.
Parameters:
name {string}
: name of header to getcase_sensitive {boolean}
: case sensitiveness flag to search for a headerReturns:
{string}
: decoded value of a headerBack to module description.
mime_part:get_header_raw(name[, case_sensitive])
Get raw value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter.
Parameters:
name {string}
: name of header to getcase_sensitive {boolean}
: case sensitiveness flag to search for a headerReturns:
{string}
: raw value of a headerBack to module description.
mime_part:get_header_full(name[, case_sensitive])
Get raw value of a header specified with optional case_sensitive flag. By default headers are searched in caseless matter. This method returns more information about the header as a list of tables with the following structure:
name
- name of a headervalue
- raw value of a headerdecoded
- decoded value of a headertab_separated
- true
if a header and a value are separated by tab
characterempty_separator
- true
if there are no separator between a header and a valueParameters:
name {string}
: name of header to getcase_sensitive {boolean}
: case sensitiveness flag to search for a headerReturns:
{list of tables}
: all values of a header as specified aboveBack to module description.
mimepart:get_header_count(name[, case_sensitive])
Lightweight version if you need just a header’s count
Parameters:
name {string}
: name of header to getcase_sensitive {boolean}
: case sensitiveness flag to search for a headerReturns:
{number}
: number of header’s occurrences or 0 if not foundBack to module description.
mimepart:get_raw_headers()
Get all undecoded headers of a mime part as a string
Parameters:
No parameters
Returns:
{rspamd_text}
: all raw headers for a message as opaque textBack to module description.
mimepart:get_headers()
Get all undecoded headers of a mime part as a string
Parameters:
No parameters
Returns:
{rspamd_text}
: all raw headers for a message as opaque textBack to module description.
mime_part:get_content()
Get the parsed content of part
Parameters:
No parameters
Returns:
{text}
: opaque text object (zero-copy if not casted to lua string)Back to module description.
mime_part:get_raw_content()
Get the raw content of part
Parameters:
No parameters
Returns:
{text}
: opaque text object (zero-copy if not casted to lua string)Back to module description.
mime_part:get_length()
Get length of the content of the part
Parameters:
No parameters
Returns:
{integer}
: length of part in bytesBack to module description.
mime_part:get_type()
Extract content-type string of the mime part
Parameters:
No parameters
Returns:
{string,string}
: content type in form ‘type’,’subtype’Back to module description.
mime_part:get_type_full()
Extract content-type string of the mime part with all attributes
Parameters:
No parameters
Returns:
{string,string,table}
: content type in form ‘type’,’subtype’, {attrs}Back to module description.
mime_part:get_detected_type()
Extract content-type string of the mime part. Use lua_magic detection
Parameters:
No parameters
Returns:
{string,string}
: content type in form ‘type’,’subtype’Back to module description.
mime_part:get_detected_type_full()
Extract content-type string of the mime part with all attributes. Use lua_magic detection
Parameters:
No parameters
Returns:
{string,string,table}
: content type in form ‘type’,’subtype’, {attrs}Back to module description.
mime_part:get_detected_ext()
Returns a msdos extension name according to lua_magic detection
Parameters:
No parameters
Returns:
{string}
: detected extension (see lua_magic.types)Back to module description.
mime_part:get_cte()
Extract content-transfer-encoding for a part
Parameters:
No parameters
Returns:
{string}
: content transfer encoding (e.g. base64
or 7bit
)Back to module description.
mime_part:get_filename()
Extract filename associated with mime part if it is an attachment
Parameters:
No parameters
Returns:
{string}
: filename or nil
if no file is associated with this partBack to module description.
mime_part:is_image()
Returns true if mime part is an image
Parameters:
No parameters
Returns:
{bool}
: true if a part is an imageBack to module description.
mime_part:get_image()
Returns rspamd_image structure associated with this part. This structure has the following methods:
get_width
- return width of an image in pixelsget_height
- return height of an image in pixelsget_type
- return string representation of image’s type (e.g. ‘jpeg’)get_filename
- return string with image’s file nameget_size
- return size in bytesParameters:
No parameters
Returns:
{rspamd_image}
: image structure or nil if a part is not an imageBack to module description.
mime_part:is_archive()
Returns true if mime part is an archive
Parameters:
No parameters
Returns:
{bool}
: true if a part is an archiveBack to module description.
mime_part:is_attachment()
Returns true if mime part looks like an attachment
Parameters:
No parameters
Returns:
{bool}
: true if a part looks like an attachmentBack to module description.
mime_part:get_archive()
Returns rspamd_archive structure associated with this part. This structure has the following methods:
get_files
- return list of strings with filenames inside archiveget_files_full
- return list of tables with all information about filesis_encrypted
- return true if an archive is encryptedget_type
- return string representation of image’s type (e.g. ‘zip’)get_filename
- return string with archive’s file nameget_size
- return size in bytesParameters:
No parameters
Returns:
{rspamd_archive}
: archive structure or nil if a part is not an archiveBack to module description.
mime_part:is_multipart()
Returns true if mime part is a multipart part
Parameters:
No parameters
Returns:
{bool}
: true if a part is is a multipart partBack to module description.
mime_part:is_message()
Returns true if mime part is a message part (message/rfc822)
Parameters:
No parameters
Returns:
{bool}
: true if a part is is a message partBack to module description.
mime_part:get_boundary()
Returns boundary for a part (extracted from parent multipart for normal parts and from the part itself for multipart)
Parameters:
No parameters
Returns:
{string}
: boundary value or nilBack to module description.
mime_part:get_enclosing_boundary()
Returns an enclosing boundary for a part even for multiparts. For normal parts
this method is identical to get_boundary
Parameters:
No parameters
Returns:
{string}
: boundary value or nilBack to module description.
mime_part:get_children()
Returns rspamd_mimepart table of part’s childer. Returns nil if mime part is not multipart or a message part.
Parameters:
No parameters
Returns:
{rspamd_mimepart}
: table of childrenBack to module description.
mime_part:is_text()
Returns true if mime part is a text part
Parameters:
No parameters
Returns:
{bool}
: true if a part is a text partBack to module description.
mime_part:get_text()
Returns rspamd_textpart structure associated with this part.
Parameters:
No parameters
Returns:
{rspamd_textpart}
: textpart structure or nil if a part is not an textBack to module description.
mime_part:get_digest()
Returns the unique digest for this mime part
Parameters:
No parameters
Returns:
{string}
: 128 characters hex string with digest of the partBack to module description.
mime_part:get_id()
Returns the order of the part in parts list
Parameters:
No parameters
Returns:
{number}
: index of the part (starting from 1 as it is Lua API)Back to module description.
mime_part:is_broken()
Returns true if mime part has incorrectly specified content type
Parameters:
No parameters
Returns:
{bool}
: true if a part has bad content typeBack to module description.
mime_part:headers_foreach(callback, [params])
This method calls callback
for each header that satisfies some condition.
By default, all headers are iterated unless callback
returns true
. Nil or
false means continue of iterations.
Params could be as following:
full
: header value is full table of all attributes task:get_header_full
for detailsregexp
: return headers that satisfies the specified regexpParameters:
callback {function}
: function from header name and header valueparams {table}
: optional parametersReturns:
No return
Back to module description.
mime_part:get_parent()
Returns parent part for this part
Parameters:
No parameters
Returns:
{rspamd_mimepart}
: parent part or nilBack to module description.
mime_part:get_specific()
Returns specific lua content for this part
Parameters:
No parameters
Returns:
{any}
: specific lua contentBack to module description.
mime_part:set_specific(<any>)
Sets a specific content for this part
Parameters:
No parameters
Returns:
{any}
: previous specific lua content (or nil)Back to module description.
mime_part:is_specific(<any>)
Returns true if part has specific lua content
Parameters:
No parameters
Returns:
{boolean}
: flagBack to module description.
mime_part:get_urls([need_emails|list_protos][, need_images])
Get all URLs found in a mime part. Telephone urls and emails are not included unless explicitly asked in list_protos
Parameters:
need_emails {boolean}
: if true
then return also email urls, this can be a comma separated string of protocols desired or a table (e.g. mailto
or telephone
)need_images {boolean}
: return urls from images (<img src=…>) as wellReturns:
{table rspamd_url}
: list of all urls foundBack to module description.
mime_part:get_stats()
Returns a table with the following data:
lines
: number of linesspaces
: number of spacesdouble_spaces
: double spacesempty_lines
: number of empty linesnon_ascii_characters
: number of non ascii charactersascii_characters
: number of ascii charactersParameters:
No parameters
Returns:
{table}
: table of statsBack to module description.
Back to top.