Module rspamd_util

This module contains some generic purpose utilities that could be useful for testing and production rules.

Brief content:

Functions:

Function Description
util.create_event_base() Creates new event base for processing asynchronous events.
util.load_rspamd_config(filename) Load rspamd config from the specified file.
util.config_from_ucl(any, string) Load rspamd config from ucl represented by any lua table.
util.encode_base64(input[, str_len, [newlines_type]]) Encodes data in base64 breaking lines if needed.
util.encode_qp(input[, str_len, [newlines_type]]) Encodes data in quoted printable breaking lines if needed.
util.decode_qp(input) Decodes data from quoted printable.
util.decode_base64(input) Decodes data from base64 ignoring whitespace characters.
util.encode_base32(input, [b32type = 'default']) Encodes data in base32 breaking lines if needed.
util.decode_base32(input, [b32type = 'default']) Decodes data from base32 ignoring whitespace characters.
util.decode_url(input) Decodes data from url encoding.
util.tokenize_text(input[, exceptions]) Create tokens from a text using optional exceptions list.
util.tanh(num) Calculates hyperbolic tangent of the specified floating point value.
util.parse_html(input) Parses HTML and returns the according text.
util.levenshtein_distance(s1, s2) Returns levenstein distance between two strings.
util.fold_header(name, value, [how, [stop_chars]]) Fold rfc822 header according to the folding rules.
util.is_uppercase(str) Returns true if a string is all uppercase.
util.humanize_number(num) Returns humanized representation of given number (like 1k instead of 1000).
util.get_tld(host) Returns effective second level domain part (eSLD) for the specified host.
util.glob(pattern) Returns results for the glob match for the specified pattern.
util.parse_mail_address(str, [pool]) Parses email address and returns a table of tables in the following format.
util.strlen_utf8(str) Returns length of string encoded in utf-8 in characters.
util.lower_utf8(str) Converts utf8 string to lower case.
util.normalize_utf8(str) Gets a string in UTF8 and normalises it to NFKC_Casefold form.
util.transliterate(str) Converts utf8 encoded string to latin transliteration.
util.strequal_caseless(str1, str2) Compares two strings regardless of their case using ascii comparison.
util.strequal_caseless_utf8(str1, str2) Compares two utf8 strings regardless of their case using utf8 collation rules.
util.get_ticks() Returns current number of ticks as floating point number.
util.get_time() Returns current time as unix time in floating point representation.
util.time_to_string(seconds) Converts time from Unix time to HTTP date format.
util.stat(fname) Performs stat(2) on a specified filepath and returns table of values.
util.unlink(fname) Removes the specified file from the filesystem.
util.lock_file(fname, [fd]) Lock the specified file.
util.unlock_file(fd, [close_fd]) Unlock the specified file closing the file descriptor associated.
util.create_file(fname, [mode]) Creates the specified file with the default mode 0644.
util.close_file(fd) Closes descriptor fd.
util.random_hex(size) Returns random hex string of the specified size.
util.zstd_compress(data, [level=1]) Compresses input using zstd compression.
util.zstd_decompress(data) Decompresses input using zstd algorithm.
util.gzip_decompress(data, [size_limit]) Decompresses input using gzip algorithm.
util.inflate(data, [size_limit]) Decompresses input using inflate algorithm.
util.gzip_compress(data, [level=1]) Compresses input using gzip compression.
util.normalize_prob(prob, [bias = 0.5]) Normalize probabilities using polynom.
util.is_utf_spoofed(str, [str2]) Returns true if a string is spoofed (possibly with another string str2).
util.get_string_stats(str) Returns table with number of letters and digits in string.
util.is_valid_utf8(str) Returns true if a string is valid UTF8 string.
util.has_obscured_unicode(str) Returns true if a string has obscure UTF symbols (zero width spaces, order marks), ignores invalid utf characters.
util.readline([prompt]) Returns string read from stdin with history and editing support.
util.readpassphrase([prompt]) Returns string read from stdin disabling echo.
util.file_exists(file) Checks if a specified file exists and is available for reading.
util.mkdir(dir[, recursive]) Creates a specified directory.
util.umask(mask) Sets new umask.
util.isatty() Returns if stdout is a tty.
util.pack(fmt, ...) .
util.packsize(fmt) .
util.unpack(fmt, s [, pos]) Unpacks string s according to the format string fmt as described in.
util.caseless_hash(str[, seed]) Calculates caseless non-crypto hash from a string or rspamd text.
util.caseless_hash_fast(str[, seed]) Calculates caseless non-crypto hash from a string or rspamd text.
util.get_hostname() Returns hostname for this machine.
util.parse_content_type(ct_string, mempool) Parses content-type string to a table.
util.mime_header_encode(hdr[, is_structured]) Encodes header if needed.
util.btc_polymod(input_values) Performs bitcoin polymod function.
util.parse_smtp_date(str[, local_tz]) Converts an SMTP date string to unix timestamp.

Functions

The module rspamd_util defines the following functions.

Function util.create_event_base()

Creates new event base for processing asynchronous events

Parameters:

No parameters

Returns:

  • {ev_base}: new event processing base

Back to module description.

Function util.load_rspamd_config(filename)

Load rspamd config from the specified file

Parameters:

No parameters

Returns:

  • {confg}: new configuration object suitable for access

Back to module description.

Function util.config_from_ucl(any, string)

Load rspamd config from ucl represented by any lua table

Parameters:

No parameters

Returns:

  • {confg}: new configuration object suitable for access

Back to module description.

Function util.encode_base64(input[, str_len, [newlines_type]])

Encodes data in base64 breaking lines if needed

Parameters:

  • input {text or string}: input data
  • str_len {number}: optional size of lines or 0 if split is not needed

Returns:

  • {rspamd_text}: encoded data chunk

Back to module description.

Function util.encode_qp(input[, str_len, [newlines_type]])

Encodes data in quoted printable breaking lines if needed

Parameters:

  • input {text or string}: input data
  • str_len {number}: optional size of lines or 0 if split is not needed

Returns:

  • {rspamd_text}: encoded data chunk

Back to module description.

Function util.decode_qp(input)

Decodes data from quoted printable

Parameters:

  • input {text or string}: input data

Returns:

  • {rspamd_text}: decoded data chunk

Back to module description.

Function util.decode_base64(input)

Decodes data from base64 ignoring whitespace characters

Parameters:

  • input {text or string}: data to decode; if rspamd{text} is used then the string is modified in-place

Returns:

  • {rspamd_text}: decoded data chunk

Back to module description.

Function util.encode_base32(input, [b32type = 'default'])

Encodes data in base32 breaking lines if needed

Parameters:

  • input {text or string}: input data
  • b32type {string}: base32 type (default, bleach, rfc)

Returns:

  • {rspamd_text}: encoded data chunk

Back to module description.

Function util.decode_base32(input, [b32type = 'default'])

Decodes data from base32 ignoring whitespace characters

Parameters:

  • input {text or string}: data to decode
  • b32type {string}: base32 type (default, bleach, rfc)

Returns:

  • {rspamd_text}: decoded data chunk

Back to module description.

Function util.decode_url(input)

Decodes data from url encoding

Parameters:

  • input {text or string}: data to decode

Returns:

  • {rspamd_text}: decoded data chunk

Back to module description.

Function util.tokenize_text(input[, exceptions])

Create tokens from a text using optional exceptions list

Parameters:

  • input {text/string}: input data
  • exceptions, {table}: a table of pairs containing <start_pos,length> of exceptions in the input

Returns:

  • {table/strings}: list of strings representing words in the text

Back to module description.

Function util.tanh(num)

Calculates hyperbolic tangent of the specified floating point value

Parameters:

  • num {number}: input number

Returns:

  • {number}: hyperbolic tangent of the variable

Back to module description.

Function util.parse_html(input)

Parses HTML and returns the according text

Parameters:

  • in {string|text}: input HTML

Returns:

  • {rspamd_text}: processed text with no HTML tags

Back to module description.

Function util.levenshtein_distance(s1, s2)

Returns levenstein distance between two strings

Parameters:

  • s1 {string}: the first string
  • s2 {string}: the second string

Returns:

  • {number}: number of differences in two strings

Back to module description.

Function util.fold_header(name, value, [how, [stop_chars]])

Fold rfc822 header according to the folding rules

Parameters:

  • name {string}: name of the header
  • value {string}: value of the header
  • how {string}: “cr” for \r, “lf” for \n and “crlf” for \r\n (default)
  • stop_chars {string}: also fold header when the

Returns:

  • {string}: Folded value of the header

Back to module description.

Function util.is_uppercase(str)

Returns true if a string is all uppercase

Parameters:

  • str {string}: input string

Returns:

  • {bool}: true if a string is all uppercase

Back to module description.

Function util.humanize_number(num)

Returns humanized representation of given number (like 1k instead of 1000)

Parameters:

  • num {number}: number to humanize

Returns:

  • {string}: humanized representation of a number

Back to module description.

Function util.get_tld(host)

Returns effective second level domain part (eSLD) for the specified host

Parameters:

  • host {string}: hostname

Returns:

  • {string}: eSLD part of the hostname or the full hostname if eSLD was not found

Back to module description.

Function util.glob(pattern)

Returns results for the glob match for the specified pattern

Parameters:

  • pattern {string}: glob pattern to match (‘?’ and ‘*’ are supported)

Returns:

  • {table/string}: list of matched files

Back to module description.

Function util.parse_mail_address(str, [pool])

Parses email address and returns a table of tables in the following format:

  • raw - the original value without any processing
  • name - name of internet address in UTF8, e.g. for Vsevolod Stakhov <blah@foo.com> it returns Vsevolod Stakhov
  • addr - address part of the address
  • user - user part (if present) of the address, e.g. blah
  • domain - domain part (if present), e.g. foo.com
  • flags - table with following keys set to true if given condition fulfilled:
    • [valid] - valid SMTP address in conformity with https://tools.ietf.org/html/rfc5321#section-4.1.
    • [ip] - domain is IPv4/IPv6 address
    • [braced] - angled <blah@foo.com> address
    • [quoted] - quoted user part
    • [empty] - empty address
    • [backslash] - user part contains backslash
    • [8bit] - contains 8bit characters

Parameters:

  • str {string}: input string
  • pool {rspamd_mempool}: memory pool to use

Returns:

  • {table/tables}: parsed list of mail addresses

Back to module description.

Function util.strlen_utf8(str)

Returns length of string encoded in utf-8 in characters. If invalid characters are found, then this function returns number of bytes.

Parameters:

  • str {string}: utf8 encoded string

Returns:

  • {number}: number of characters in string

Back to module description.

Function util.lower_utf8(str)

Converts utf8 string to lower case

Parameters:

  • str {string}: utf8 encoded string

Returns:

  • {string}: lowercased utf8 string

Back to module description.

Function util.normalize_utf8(str)

Gets a string in UTF8 and normalises it to NFKC_Casefold form RSPAMD_UNICODE_NORM_NORMAL = 0, RSPAMD_UNICODE_NORM_UNNORMAL = (1 « 0), RSPAMD_UNICODE_NORM_ZERO_SPACES = (1 « 1), RSPAMD_UNICODE_NORM_ERROR = (1 « 2), RSPAMD_UNICODE_NORM_OVERFLOW = (1 « 3)

Parameters:

  • str {string}: utf8 encoded string

Returns:

  • {string,integer}: lowercased utf8 string + result of the normalisation (use bit.band to check):

Back to module description.

Function util.transliterate(str)

Converts utf8 encoded string to latin transliteration

Parameters:

  • str {string/text}: utf8 encoded string

Returns:

  • {text}: transliterated string

Back to module description.

Function util.strequal_caseless(str1, str2)

Compares two strings regardless of their case using ascii comparison. Returns true if str1 is equal to str2

Parameters:

  • str1 {string}: utf8 encoded string
  • str2 {string}: utf8 encoded string

Returns:

  • {bool}: result of comparison

Back to module description.

Function util.strequal_caseless_utf8(str1, str2)

Compares two utf8 strings regardless of their case using utf8 collation rules. Returns true if str1 is equal to str2

Parameters:

  • str1 {string}: utf8 encoded string
  • str2 {string}: utf8 encoded string

Returns:

  • {bool}: result of comparison

Back to module description.

Function util.get_ticks()

Returns current number of ticks as floating point number

Parameters:

No parameters

Returns:

  • {number}: number of current clock ticks (monotonically increasing)

Back to module description.

Function util.get_time()

Returns current time as unix time in floating point representation

Parameters:

No parameters

Returns:

  • {number}: number of seconds since 01.01.1970

Back to module description.

Function util.time_to_string(seconds)

Converts time from Unix time to HTTP date format

Parameters:

  • seconds {number}: unix timestamp

Returns:

  • {string}: date as HTTP date

Back to module description.

Function util.stat(fname)

Performs stat(2) on a specified filepath and returns table of values

  • size: size of file in bytes
  • type: type of filepath: regular, directory, special
  • mtime: modification time as unix time

Parameters:

No parameters

Returns:

  • {string,table}: string is returned when error is occurred

Example:

local err,st = util.stat('/etc/password')

if err then
  -- handle error
else
  print(st['size'])
end

Back to module description.

Function util.unlink(fname)

Removes the specified file from the filesystem

Parameters:

  • fname {string}: filename to remove

Returns:

  • {boolean,[string]}: true if file has been deleted or false,’error string’

Back to module description.

Function util.lock_file(fname, [fd])

Lock the specified file. This function returns {number} which must be passed to util.unlock_file after usage or you’ll have a resource leak

Parameters:

  • fname {string}: filename to lock
  • fd {number}: use the specified fd instead of opening one

Returns:

  • {number|nil,string}: number if locking was successful or nil + error otherwise

Back to module description.

Function util.unlock_file(fd, [close_fd])

Unlock the specified file closing the file descriptor associated.

Parameters:

  • fd {number}: descriptor to unlock
  • close_fd {boolean}: close descriptor on unlocking (default: TRUE)

Returns:

  • {boolean[,string]}: true if a file was unlocked

Back to module description.

Function util.create_file(fname, [mode])

Creates the specified file with the default mode 0644

Parameters:

  • fname {string}: filename to create
  • mode {number}: open mode (you should use octal number here)

Returns:

  • {number|nil,string}: file descriptor or pair nil + error string

Back to module description.

Function util.close_file(fd)

Closes descriptor fd

Parameters:

  • fd {number}: descriptor to close

Returns:

  • {boolean[,string]}: true if a file was closed

Back to module description.

Function util.random_hex(size)

Returns random hex string of the specified size

Parameters:

  • len {number}: length of desired string in bytes

Returns:

  • {string}: string with random hex digests

Back to module description.

Function util.zstd_compress(data, [level=1])

Compresses input using zstd compression

Parameters:

  • data {string/rspamd_text}: input data

Returns:

  • {rspamd_text}: compressed data

Back to module description.

Function util.zstd_decompress(data)

Decompresses input using zstd algorithm

Parameters:

  • data {string/rspamd_text}: compressed data

Returns:

  • {error,rspamd_text}: pair of error + decompressed text

Back to module description.

Function util.gzip_decompress(data, [size_limit])

Decompresses input using gzip algorithm

Parameters:

  • data {string/rspamd_text}: compressed data
  • size_limit {integer}: optional size limit

Returns:

  • {rspamd_text}: decompressed text

Back to module description.

Function util.inflate(data, [size_limit])

Decompresses input using inflate algorithm

Parameters:

  • data {string/rspamd_text}: compressed data
  • size_limit {integer}: optional size limit

Returns:

  • {rspamd_text}: decompressed text

Back to module description.

Function util.gzip_compress(data, [level=1])

Compresses input using gzip compression

Parameters:

  • data {string/rspamd_text}: input data

Returns:

  • {rspamd_text}: compressed data

Back to module description.

Function util.normalize_prob(prob, [bias = 0.5])

Normalize probabilities using polynom

Parameters:

  • prob {number}: probability param
  • bias {number}: number to subtract for making the final solution

Returns:

  • {number}: normalized number

Back to module description.

Function util.is_utf_spoofed(str, [str2])

Returns true if a string is spoofed (possibly with another string str2)

Parameters:

No parameters

Returns:

  • {boolean}: true if a string is spoofed

Back to module description.

Function util.get_string_stats(str)

Returns table with number of letters and digits in string

Parameters:

No parameters

Returns:

  • {table}: with string stats keys are “digits” and “letters”

Back to module description.

Function util.is_valid_utf8(str)

Returns true if a string is valid UTF8 string

Parameters:

No parameters

Returns:

  • {boolean}: true if a string is spoofed

Back to module description.

Function util.has_obscured_unicode(str)

Returns true if a string has obscure UTF symbols (zero width spaces, order marks), ignores invalid utf characters

Parameters:

No parameters

Returns:

  • {boolean}: true if a has obscured unicode characters (+ character and offset if found)

Back to module description.

Function util.readline([prompt])

Returns string read from stdin with history and editing support

Parameters:

No parameters

Returns:

  • {string}: string read from the input (with line endings stripped)

Back to module description.

Function util.readpassphrase([prompt])

Returns string read from stdin disabling echo

Parameters:

No parameters

Returns:

  • {string}: string read from the input (with line endings stripped)

Back to module description.

Function util.file_exists(file)

Checks if a specified file exists and is available for reading

Parameters:

No parameters

Returns:

  • {boolean,string}: true if file exists + string error if not

Back to module description.

Function util.mkdir(dir[, recursive])

Creates a specified directory

Parameters:

No parameters

Returns:

  • {boolean[,error]}: true if directory has been created

Back to module description.

Function util.umask(mask)

Sets new umask. Accepts either numeric octal string, e.g. ‘022’ or a plain number, e.g. 0x12 (since Lua does not support octal integrals)

Parameters:

No parameters

Returns:

  • {number}: old umask

Back to module description.

Function util.isatty()

Returns if stdout is a tty

Parameters:

No parameters

Returns:

  • {boolean}: true in case of output being tty

Back to module description.

Function util.pack(fmt, ...)

Backport of Lua 5.3 string.pack function: Returns a binary string containing the values v1, v2, etc. packed (that is, serialized in binary form) according to the format string fmt A format string is a sequence of conversion options. The conversion options are as follows:

  • <: sets little endian
  • : sets big endian

  • =: sets native endian
  • ![n]: sets maximum alignment to n (default is native alignment)
  • b: a signed byte (char)
  • B: an unsigned byte (char)
  • h: a signed short (native size)
  • H: an unsigned short (native size)
  • l: a signed long (native size)
  • L: an unsigned long (native size)
  • j: a lua_Integer
  • J: a lua_Unsigned
  • T: a size_t (native size)
  • i[n]: a signed int with n bytes (default is native size)
  • I[n]: an unsigned int with n bytes (default is native size)
  • f: a float (native size)
  • d: a double (native size)
  • n: a lua_Number
  • cn: a fixed-sized string with n bytes
  • z: a zero-terminated string
  • s[n]: a string preceded by its length coded as an unsigned integer with
  • n bytes (default is a size_t)
  • x: one byte of padding
  • Xop: an empty item that aligns according to option op (which is otherwise ignored)
  • ’ ‘: (empty space) ignored

(A “[n]” means an optional integral numeral.) Except for padding, spaces, and configurations (options “xX <=>!”), each option corresponds to an argument (in string.pack) or a result (in string.unpack).

For options “!n”, “sn”, “in”, and “In”, n can be any integer between 1 and All integral options check overflows; string.pack checks whether the given value fits in the given size; string.unpack checks whether the read value fits in a Lua integer.

Any format string starts as if prefixed by “!1=”, that is, with maximum alignment of 1 (no alignment) and native endianness.

Alignment works as follows: For each option, the format gets extra padding until the data starts at an offset that is a multiple of the minimum between the option size and the maximum alignment; this minimum must be a power of 2. Options “c” and “z” are not aligned; option “s” follows the alignment of its starting integer.

All padding is filled with zeros by string.pack (and ignored by unpack).

Parameters:

No parameters

Returns:

No return

Back to module description.

Function util.packsize(fmt)

Returns size of the packed binary string returned for the same fmt argument by util.pack

Parameters:

No parameters

Returns:

No return

Back to module description.

Function util.unpack(fmt, s [, pos])

Unpacks string s according to the format string fmt as described in util.pack

Parameters:

No parameters

Returns:

  • s {multiple} list of unpacked values according to fmt

Back to module description.

Function util.caseless_hash(str[, seed])

Calculates caseless non-crypto hash from a string or rspamd text

Parameters:

  • str {no type}: string or lua_text
  • seed {no type}: mandatory seed (0xdeadbabe by default)

Returns:

  • {int64}: boxed int64_t

Back to module description.

Function util.caseless_hash_fast(str[, seed])

Calculates caseless non-crypto hash from a string or rspamd text

Parameters:

  • str {no type}: string or lua_text
  • seed {no type}: mandatory seed (0xdeadbabe by default)

Returns:

  • {number}: number from int64_t

Back to module description.

Function util.get_hostname()

Returns hostname for this machine

Parameters:

No parameters

Returns:

  • {string}: hostname

Back to module description.

Function util.parse_content_type(ct_string, mempool)

Parses content-type string to a table:

  • type
  • subtype
  • charset
  • boundary
  • other attributes

Parameters:

  • ct_string {string}: content type as string
  • mempool {rspamd_mempool}: needed to store temporary data (e.g. task pool)

Returns:

  • table or nil if cannot parse content type

Back to module description.

Function util.mime_header_encode(hdr[, is_structured])

Encodes header if needed

Parameters:

  • hdr {string}: input header
  • is_structured {boolean}: if true, then we encode as structured header (e.g. encode all non alpha-numeric characters)

Returns:

  • encoded header

Back to module description.

Function util.btc_polymod(input_values)

Performs bitcoin polymod function

Parameters:

  • input_values {table|numbers}: no description

Returns:

  • {boolean}: true if polymod has been successful

Back to module description.

Function util.parse_smtp_date(str[, local_tz])

Converts an SMTP date string to unix timestamp

Parameters:

  • str {string}: input string
  • local_tz {boolean}: convert to local tz if true

Returns:

  • {number}: time as unix timestamp (converted to float)

Back to module description.

Back to top.