The Bayes expiry
module provides intelligent expiration of statistical tokens for the new schema
of Redis statistics storage.
For the bayes expiry
module, classifier-related configuration settings should be incorporated into the corresponding classifier
section, such as the local.d/classifier-bayes.conf
file. Additionally, as the Bayes expiry
module necessitates the use of the new statistics schema, it is imperative to enable it within the classifier configuration:
new_schema = true; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0
The following settings are valid:
common
category are not affected. For more information, see the expiration modes for detail. Supported values are:
-1
: make tokens persistent;false
: disable bayes expiry
for the classifier. Note that this does not change the TTLs of existing tokens, but new learned tokens will be persistent.true
- enable lazy expiration mode (disabled by default). See expiration modes for detail.Configuration example:
new_schema = true; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0
expire = 8640000;
#lazy = true; # Before 2.0
To modify the global settings of the bayes expiry
module, you can configure them in either the local.d/bayes_expiry.conf
or override.d/bayes_expiry.conf
file.
The following settings are valid:
60
.1000
. Consider increasing it to a higher value if your Redis instance is overwhelmed with too many persistent keys, suggesting faster learning compared to the module’s processing.common
. Tokens with a difference between spam and ham relative frequencies not greater than this value are classified as common
. Default is 0.01
.common
tokens. Default is 10 * 86400
, equivalent to 10 days.significant
; otherwise, they are insignificant
. Defaults is 3.0 / 4.0
, which corresponds to 75%.Configuration example:
interval = 90;
count = 15000;
The bayes expiry
module performs an expiry step every minute. During each step, it examines the frequency of approximately 1000 statistical tokens and adjusts their TTLs if needed. The duration of a full iteration varies based on the number of tokens; for example, a full cycle for 10 million tokens takes approximately one week to complete. Once the bayes expiry
module finishes a full iteration, it starts over again.
The Bayes expiry
module categorizes tokens into four groups based on their frequency of occurrence in ham and spam classes:
significant
and common
tokens.The default
mode has been removed in Rspamd 2.0 as it offers no benefits compared to thelazy
mode.
Operation:
significant
token’s lifetime: update token’s TTL every time to expire
value.insignificant
or infrequent
token.common
token: reset TTL to a low value (10d) if the token has greater TTL.Disadvantages:
expire
time. The Bayes Expiry module must periodically update their TTLs, which means special backup procedures are required. Simply copying the *.rdb
file will result in its expiration after the expire
time has passed.significant
tokens, constant updating of their TTLs is not necessary.The lazy
mode is the only expiration mode since Rspamd 2.0.
This mode ensures that significant
tokens with a TTL are persistently kept (the module sets significant tokens TTLs to -1, i.e. makes them persistent if they are not), while TTL of insignificant
or infrequent
tokens is reduced to the expire
value if its current TTL exceeds expire
. Common
tokens are discriminated by resetting their TTL to a lower value of 10 days, if their TTL exceed this threshold.
The advantages of the “lazy” mode include:
significant
tokens.To activate the lazy expiration mode in Rspamd versions prior to 2.0, simply add lazy = true;
to the classifier configuration.
The expiration mode for an existing statistics database can be altered in the configuration at any moment. Token’s TTLs will be updated as required during the subsequent expiration cycle.
When a new expire
value is set to a lower value than the current one, the TTLs exceeding the new expire
value will be updated during the next expiration cycle.
To increase the expire
value, it is necessary first to make the tokens persistent by setting expire = -1;
and waiting until at least one expiration cycle is completed. Only then the new expire
value can be set.
The memory usage of the statistics dataset can be managed using the Redis maxmemory
directive and volatile-ttl
eviction policy. If the memory usage exceeds the set “maxmemory” limit, Redis will evict keys with shorter TTLs in accordance with the policy. Additionally, memory usage can be maintained at a nearly constant level by setting the TTL to an extremely high value, causing keys to be evicted instead of expiring.
To ensure that the memory limit and eviction policy only apply to the Bayesian statistics dataset, it should be stored in a separate Redis instance. A comprehensive explanation on configuring multi-instance Redis can be found in the Redis replication tutorial.
local.d/classifier-bayes.conf
:
backend = "redis"; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0
servers = "localhost:6378";
new_schema = true; # Enabled by default for classifier "bayes" in the stock statistic.conf since 2.0
expire = 2144448000;
lazy = true; # Before 2.0
Setting expire = 2144448000;
sets a very high TTL of 68 years, as there is no need for the actual expiration of keys.
/usr/local/etc/redis-bayes.conf
:
include /usr/local/etc/redis.conf
port 6378
pidfile /var/run/redis/bayes.pid
logfile /var/log/redis/bayes.log
dbfilename bayes.rdb
dir /var/db/redis/bayes/
maxmemory 500MB
maxmemory-policy volatile-ttl
Where maxmemory 500MB
sets Redis to use the specified amount of memory for the instance’s dataset and maxmemory-policy volatile-ttl
sets Redis to use the eviction policy when the maxmemory
limit is reached.