We are proud to announce the next major version of Rspamd today: 1.7.0. This version includes more than 1000 changes since Rspamd 1.6 branch. The most significant changes are:
- Better machine learning support and embedding of Lua Torch
- Language detection support based on ngramms and heuristics
- New
rspamadm configwizard
command for a simple configuration setup - New statistics model for
Redis
backend allowing expiration and better analytics - Improved wizards for statistics conversion and management
- Added automatic corpus test and rescoring utility based on Google Summer of Code 2017 project completed by @cpragadeesh
- New Elasticsearch plugin
- New experimental Reputation plugin
- Various other important improvements and fixes
Release cycle model change
From this version onwards, Rspamd will no longer have stable
and experimental
branches. All development will be concentrated in the main branch with more frequent releases of both minor and major versions. Major version change (e.g. from 1.7
to 1.8
) will mean some important change with backward compatility or a clear conversion path. Minor releases (e.g. 1.7.0
-> 1.7.1
) will be released as soon as there are enough important changes. Any critical fix will cause a new version to be released.
We have decided to eliminate the concept of stable branches since it makes the processes of development and migration more complicated for both developers and Rspamd users. We will concentrate on stability and backward compatibility of the main branch instead.
Migration notes
Please read migration notes. We believe that no explicit configuration changes are required to upgrade from Rspamd 1.6.6.
New features in Rspamd 1.7
Here is a brief description of the new features appeared in Rspamd 1.7.
Better machine learning support
Rspamd has bundled torch support which is enabled by default. This framework is one of the principal Machine Learning frameworks implemented in Lua. We have decided to bundle it directly in Rspamd due to poor support of packaged version of Torch in the vast majority of Linux and Unix distributions. It is currently available for Intel x86_64 architecture only. Rspamd includes the following components of this framework:
- basic Tensors and math framework (torch7)
- neural networks support (nn)
- optimization algorithms package (optim)
- random forests support
So far, Torch is used in neural network module to improve neural network model and the speed of processing. This module has been improved to support clustered configurations, mail flows separation, and multiple neural networks.
Torch also empowers a new rescore
utility described later in this article.
You can use Torch to build your own Machine Learning models to improve the quality of your spam filtering rules.
Language detection support
Rspamd 1.7 includes a new language detection support. It uses NGramms model to support more than 50 languages. Rspamd implements a fast and sophisticated algorithm to detect texts languages using unicode properties, ngramms and statistical methods to provide a more precise language detection. This information could be used for training models (e.g. word2vec embedding), better bayes classification (e.g. by removing of stop words), or by individual rules.
New rspamadm configwizard
command
This command is intended to simplify Rspamd configuration and migration. It provides an interactive console UI to setup the most commonly used Rspamd functions not necessarily configured out-of-the-box:
It can also be used on any stage to adjust configuration:
- DKIM signing setup:
rspamadm configwizard dkim
- Controller password:
rspamadm configwizard controller
- Statistics tools:
rspamadm configwizard statistic
- Redis setup:
rspamadm configwizard redis
New model for Redis backend of statistical data
Rspamd has switched from the old layout of Redis storage where tokens were stored in two large hash tables: RSBAYES_SPAM
and RSBAYES_HAM
to a model where each token is stored separately: RS_<token_id>
and 2 buckets S
for spam hits and H
for ham hits. The new model requires more space in Redis, however, it allows to expire meaningless or not frequently used tokens efficiently reducing storage requirements. A new explicitly enabled plugin called bayes_expire
provides inteligent renewal and eviction of statistical tokens.
It is also possible to store token values inside statistical buckets for debugging and analytics purposes.
You can convert an old statistics in Redis (or sqlite) to a new one using rspamadm configwizard statistic
command.
Corpus test and rescore tool
This project has been started as a Google Summer Of Code project and was completed by @cpragadeesh in 2017. It allows Rspamd to run against some pre-labeled corpus of spam and ham (rspamadm corpus_test
tool) and then analyze anonymous logs produced by this command to adjust the best possible scores for Rspamd rules (rspamadm rescore
). Here is a sample of this command run:
$ rspamadm rescore -l ./rescore_logs -o scores.ucl -d -i 10 --penalty-weight=0 --momentum=0.1 --learning-rate=0.01 --l1=0.0001
...
We plan to improve our scores using this tool and distribute updates using rspamd_update
plugin.
Elasticsearch plugin
Rspamd now provides elastic search plugin that provides integration with Elasticsearch engine and Kibana board to visualise Rspamd data. It has such features as symbols heat map, geographical distribution of spam/ham sources, proportions of traffic scanned and so on. Elastic search plugin and clickhouse plugin provide a decent platform to collect analytics about your mail flows and spam filter efficiency.
Reputation plugin
Rspamd 1.7 includes an experimental reputation plugin. It is currently in work-in-progress state, however, we have a working prototype with the following features:
- DKIM domain, URL domains and IP reputation support
- Flexible backends supports: both Redis for private usage and DNS for public services
- Flexible time buckets: long term buckets and short term buckets
- Flexible aggregation tool (weighted aggregation, decision trees and so on)
This plugin is intended to replace the original ip score plugin and will provide much more reputation types (e.g. URL and DKIM reputation). It is also possible to build systems with both public reputation data that could be provided via DNS and internal reputation data stored in Redis buckets. This could be particularly beneficial for large email service providers.
In this version, the plugin is in still experimental stage but it is close to production testing so far.
7Zip support
Rspamd can now detect and process data from 7zip
files. This functionality lives within mime types module and allows to filter malicious files in 7zip attachments.
Various improvements and changes
In conclusion, we can add that this version of Rspamd includes a lot of improvements in stability, performance and quality of filtering areas.