We have released Rspamd 2.5 today.
You can find that the first start of Rspamd will take more time than usually (around 10-30 seconds). That is intended as Rspamd has to recompile some of the url pattern matchers. It will not happen on subsequent restarts nor to the subsequent updates. It should also not happen if you have used Release Candidates for this version.
There are 3 major projects in this release:
- URL extraction rework: the URL extraction logic in Rspamd has been significantly reworked to provide better DoS resistance, better matching and lower false positives.
- URL structure update: the URL storage structures have been reworked to occupy less memory and use a more efficient storage
- Hyperscan early load: since PCRE is a backtracking RE engine it is not very safe to use in conjunction with complicated rules and untrusted data, in this release, Rspamd will load hyperscan database on early stages to avoid PCRE fallback if possible
Several major fixes:
- Base64 detection has been fixed and improved to reduce FP rate
- Query urls are now fully processed
- Bundled libev has been updated to 4.33 (fixing many issues with FD closing race conditions)
- Fixed ANN normalisation
- Fixed redis backend leaks
Useful features:
- Added
whitelisted_signers_map
in ARC module - Implemented
/etc/hosts
files processing
Here is the list of the most important changes:
- [Conf] Mark Rspamd emailbl as ignore whitelist
- [Conf] RBL: Add missing emails = true option
- [Feature] Add support for scripts in fuzzy storage
- [Feature] Arc: Add whitelisted_signers_map option
- [Feature] Implement hosts file processing
- [Feature] Neural: Introduce classes bias that allows non-equal classes learning
- [Feature] Update libev to 4.33
- [Fix] Another brain damage html standard adoptions
- [Fix] Another fix for brain damaged obs-fws state
- [Fix] Fix flags that caused force_actions failure
- [Fix] Fix logging issue
- [Fix] Fix lua symbols scores registration when config does not define scores
- [Fix] Fix opaque maps logic
- [Fix] Fix parsing of the html tags with no spaces after attributes
- [Fix] Fix some corner cases in urls parsing, add limits
- [Fix] Fix tlds extraction if custom composition rules are used
- [Fix] Fix variables replacement in mempool
- [Fix] Improve base64 detection
- [Fix] Normalize dynamic scores in ANN correctly
- [Fix] Plug memory leak introduced by #3153
- [Fix] Stat_redis_backend: Fix memory leak and simplify learn path
- [Fix] Try hard to deal with ghost workers
- [Fix] metadata_exporter default formatter
- [Rework] Change the way to extract URLs when dealing with alternative parts
- [Rework] Fix various url extraction issues
- [Rework] Re cache: Load compiled hyperscan in the main process as well
- [Rework] Re cache: Load hyperscan early
- [Rework] Rework URL structure: adjust tld part
- [Rework] Rework URL structure: host field
- [Rework] Rework URL structure: more structure optimisations
- [Rework] Rework URL structure: user field
- [Rework] URL: Another update for urls extraction logic
- [Rework] Urls: Improve query urls handling
- [Rework] Urls: adopt html related stuff
- [Rework] Urls: more rework of the urls sets
- [Rework] Urls: process query urls in HTML urls correctly
- [Rework] Urls: rework urls hash structure
- [Rework] Urls: update lua libraries
- [Rework] Use multiple search tries for different url extraction types