Integrating Rspamd with GPT

Preface

Historically, our only text classification method has been Bayes, a powerful statistical method that performs well with sufficient training. However, Bayes has its limitations:

It requires thorough and well-balanced training
It cannot work with low confidence levels, especially when dealing with a wide variety of spam

Large Language Models (LLMs) offer promising solutions to these challenges. These models can perform deep introspection with some sort of contextual “understanding”. However, their high computational demands (typically requiring GPUs) make scanning all emails impractical. Separating LLM execution from the scanning engine mitigates resource competition.

Rspamd GPT plugin

In Rspamd 3.9, I have tried to integrate the OpenAI GPT API for spam filtering and assess its usefulness. Here are the basic ideas behind this plugin:

The selected displayed text part is extracted and submitted to the GPT API for spam probability assessment
Additional message details such as Subject, displayed From, and URLs are also included in the assessment
Then, we ask GPT to provide results in JSON format since human-readable GPT output cannot be parsed (in general)
Some specific symbols (BAYES_SPAM, FUZZY_DENIED, REPLY, etc.) are excluded from the GPT scan
Obvious spam and ham are also excluded from the GPT evaluation

The former two points reduce the GPT workload for something that is already known, where GPT cannot add any value in the evaluation. We also use GPT as one of the classifiers, meaning that we do not rely solely on GPT evaluation.

Evaluation results

To evaluate the performance of the GPT-based classifier, we developed the rspamadm classifier_test utility, capable of evaluating both supervised and unsupervised classifiers:

It divides spam and ham samples into separate training and validation sets
For supervised classifiers, it uses the training set to train the classifier
Then both supervised and unsupervised classifiers are evaluated using the validation set to measure their performance

For example, the Bayes engine, trained on a robust corpus, demonstrates the following results:

$ rspamadm classifier_test --ham /ham --spam /spam --cv-fraction 0.3

Spam: 348 train files, 815 cv files; ham: 754 train files, 1762 cv files
Start learn spam, 348 messages, 10 connections
Start learn ham, 754 messages, 10 connections
Learning done: 348 spam messages in 1.61 seconds, 754 ham messages in 3.88 seconds
Start cross validation, 2577 messages, 10 connections
Metric               Value
------------------------------
True Positives       735
False Positives      22
True Negatives       1717
False Negatives      49
Accuracy             0.97
Precision            0.97
Recall               0.94
F1 Score             0.95
Classified (%)       97.90
Elapsed time (seconds) 12.71

These results are impressive but assume the classifier is properly and decently trained. In scenarios involving a fresh system or high variability in emails, gathering reliable statistics might be challenging.

In contrast, the GPT engine operates as an unsupervised learning algorithm. We assume that LLM models have enough “understanding” of the language to distinguish spam and ham without direct training on emails. Moreover, we provide only text data, not raw email content.

Below are the results from different GPT models:

OpenAI GPT-3.5 Turbo

Metric               Value
------------------------------
True Positives       129
False Positives      35
True Negatives       263
False Negatives      69
Accuracy             0.79
Precision            0.79
Recall               0.65
F1 Score             0.71
Classified (%)       95.20
Elapsed time (seconds) 318.91

This model is cost-effective and can be used as a baseline. The results were obtained from a low-quality sample corpus, resulting in high false positives and negatives.

OpenAI GPT-4o

Metric               Value
------------------------------
True Positives       178
False Positives      25
True Negatives       257
False Negatives      9
Accuracy             0.93
Precision            0.88
Recall               0.95
F1 Score             0.91
Classified (%)       90.02
Elapsed time (seconds) 279.08

Despite its high cost, this advanced model is suitable, for example, for low-traffic personal email. It demonstrates significantly lower error rates compared to GPT-3.5, even with a similar low-quality sample corpus.

Using GPT to train Bayes classifier

Another interesting approach involves using GPT to supervise Bayes engine training. In this case, we benefit from the best of both worlds: GPT can operate without training, while Bayes can catch up afterward and perform instead of GPT (or at least serve as a cost-saving alternative).

So we tested GPT training Bayes and compared efficiency using the same methodologies.

GPT results:

Metric               Value
------------------------------
True Positives       128
False Positives      13
True Negatives       301
False Negatives      68
Accuracy             0.84
Precision            0.91
Recall               0.65
F1 Score             0.76
Classified (%)       97.89
Elapsed time (seconds) 341.77

Bayes classifier results (trained by GPT in the previous test iteration):

Metric               Value
------------------------------
True Positives       19
False Positives      43
True Negatives       269
False Negatives      9
Accuracy             0.85
Precision            0.31
Recall               0.68
F1 Score             0.42
Classified (%)       65.26
Elapsed time (seconds) 29.18

Bayes still exhibits uncertainty in classification, with more false positives than GPT. Improvement could be achieved through autolearning and by refining the corpus used for testing (our corpus contains many ham emails that look like spam even for human evaluators).

Plugin design

The GPT plugin operates as follows:

It selects messages that qualify several pre-checks:
- they must not contain any symbols from the excluded set (e.g. Fuzzy/Bayes spam/Whitelists)
- they must not clearly appear as ham or spam (e.g. with reject action or no action with a high negative score)
- they should have enough text tokens in the meaningful displayed part
If a message satisfies these checks, Rspamd selects the displayed part (e.g. HTML) and sends the following content to GPT:
- text part content as a single-line string (honoring limits if necessary)
- message subject
- displayed From
- some details about URLs (e.g. domains)
This data is merged with a prompt to GPT requesting an evaluation of the email’s spam probability, with the output returned in JSON format (other output types may sometimes allow GPT to provide human-readable text that is very difficult to parse)
After these steps, a corresponding symbol with a confidence score is inserted
With autolearning enabled, Rspamd also trains the supervised classifier (Bayes)

Pricing considerations and conclusions

OpenAI provides an API for these requests, incurring costs (currently no free tier available). However, for personal email usage or automated Bayes training without manual intervention, GPT presents a viable option. For instance, processing a substantial volume of personal emails with GPT-3.5 costs approximately $0.05 daily (for about 100k tokens).

For large-scale email systems, it may be preferable to use another LLM (e.g. llama) internally on a GPU-powered platform. The current plugin is designed to integrate with different LLM types without significant modifications. This approach also enhances data privacy by avoiding sending email content to a third-party service (though OpenAI claims their models do not learn from API requests).

Despite not achieving 100% accuracy, the GPT plugin demonstrates efficiency comparable to human-filtered email. Future enhancements will focus on improving accuracy through additional metadata integration into the GPT engine, while optimizing token usage efficiency. There are also plans to better utilize LLM knowledge in Rspamd, particularly for better fine-grained classification.

The GPT plugin will be available starting from Rspamd 3.9, requiring an OpenAI API key and financial commitment for accessing ChatGPT services.