POPFile on CentOS 4.4
By: Max Hetrick
Last Updated: 10-12-2006
Printer friendly: http://www2.maxsworld.org/howtos/popfile.html
System/Dependencies
CentOS 4.4
POPFile v0.22.4
Thunderbird 1.5.0
Perl 5.8.5 (Requires Perl 5.6.0 or better)
OpenSSL (If using SSL)
DBI
DBD::SQLite2 (Suggested to stick with version 2 which is SQLite 0.33 instead of 1.x)
Digest::base
Digest::MD5
HTML::Tagset
HTML::Template
MIME::Base64
MIME::QuotedPrint
Date::Parse
IO::Socket::SSL (If using SSL)
Net::SSLeay (If using SSL)
References
POPFile: http://popfile.sourceforge.net
SourceForge: https://sourceforge.net/projects/popfile/
Description
POPFile is a nifty little Bayesian e-mail classifier filter written by John Graham-Cumming. You can think of it as a proxy between your e-mail client and your ISP’s POP server. Although you can use it do sorting of your e-mail messages, it’s better known for sorting out SPAM. You can also compare it somewhat to the well-known DSPAM, however, POPFile is much smaller scale, and much easier to use with your personal e-mail client. I’ll be explaining setup with my Thunderbird client and a simple pop server. I’ll also be explaining setup using the built in Perl DB SQLite since it makes for easy installation. You can use MySQL or PostgreSQL as well, but seek the documentation for setting it up through it. I’ll also not be using SSL, since my instructions are for using it on a personal system not exposed to the outside world. The web interface connects through a web browser on localhost, so naturally if you’re closed off from the outside world only you will be able to access it. If you want security, then please look at the documenation for installing using SSL.
1) Perl
a) Check your modules
If you’ve never used CPAN on your system, then you’ll have to run through the CPAN setup the first time you call it from the command line. Everything is pretty much already filled in for you, so if you run into problems a simple search on the web will help you in setting it up. First, check to see what versions of the required modules you already have installed. For each module you run the command below, you’ll get the version number. For anything missing, you’ll get a bunch of Perl errors so you know what you have to install.
[root@laptop ~]# perl -MDBI -e ‘print $DBI::VERSION’ 1.40 [root@laptop ~]# perl -MDBD::SQLite2 -e ‘print $DBD::SQLite2::VERSION’ 0.33 …and so on.
b) Install missing modules
After you’ve determined what you need installed and you’ve run the CPAN setup as mentioned above, drop to the CPAN shell and start installing the missing items.
[root@laptop ~]# perl -MCPAN -e shell cpan>install DBI …tons of Perl compilations. cpan>install DBD::SQLite2 …tons more of Perl compilations and so on. cpan>quit
2) POPFile Installation
a) Download and Unpack
You can install POPFile into a common place such as /usr/local or /usr/share, but it’s recommended to install into the user directory. This only allows access by you or whomever owns /home/username. Drop to a command line as you and not as root for the time being.
[me@laptop ~]# mkdir popfile [me@laptop ~]# cd popfile [me@laptop popfile]# unzip /path/to/popfile-0.22.4.zip [me@laptop popfile]# chmod ug+x popfile.pl
b) Setup Logging and Service
From here, you can actually start POPFile from the Perl script above, but let’s set it up as a service so you don’t have to worry about turning it on. It’ll start at boot time then. The following is carried out as root. After you download the init script, make sure to change two paths listed below to your /home/username.
[root@laptop ~]# mkdir /var/log/popfile; touch /var/log/popfile/popfile [root@laptop ~]# wget http://www2.maxsworld.org/configs/popfile.txt [root@laptop ~]# mv popfile.txt popfile [root@laptop ~]# vim popfile # Change the following two paths to match yours popfile_root=/home/username/popfile popfile_user=/home/username/popfile [root@laptop ~]# chmod +x popfile [root@laptop ~]# chkconfig –add popfile [root@laptop ~]# chkconfig popfile on [root@laptop ~]# service popfile start
Now if everything is good, you should see some things in the log files you just created. If there are any errors, you can use this to track down the issue. Once started, POPFile creates a few other logs files too if you want to check them out just run a directory listing on /var/log/popfile.
[root@laptop ~]# cat /var/log/popfile/popfile
POPFile Engine loading
Loading…
{core: config history logger mq}
{classifier: bayes wordmangle}
{interface: xmlrpc html}
{proxy: pop3 nntp smtp}
{services: imap}
POPFile Engine v0.22.4 starting
Initializing…
{core: config history logger mq}
{classifier: bayes wordmangle}
{interface: html xmlrpc}
{proxy: nntp pop3 smtp}
{services: imap}
Starting…
{core: config history logger mq}
{classifier: bayes wordmangle}
{interface: html}
{proxy: pop3}
{services:}
POPFile Engine v0.22.4 running
3) POPFile Web and E-Mail Client Configuration
a) Configure Buckets
Now that POPFile is actually running, you can access the very easy to use web interface. Before you can actually use it, though, you need to create at least two buckets. If you’re using this for SPAM only classification then I would suggest using my setup buckets: spam and ham. Obviously spam is the bad stuff, and ham is everything else. Later on if you want to set up even more sorting containers, then go to town. For simple SPAM recognition two buckets will do.
Open up your web browser and go to http://127.0.0.1:8080/. You should see something something similar to below minus the actual e-mail history. I’m using my already configured screen shot as a reference and I’ve cut out the e-mail information already collected.

Go ahead and click on the Buckets tab. At the bottom left hand side of the panel create two new buckets. One called spam and one called ham like already mentioned. Type in the name, and hit create for each new bucket.

Now, at the top of the same Buckets tab make sure your configurations look like the following shot. It’s very important that you make sure the X-Text-Classification box is checked for the buckets you’ve created. POPFile throws this header line into your e-mail message which your e-mail client will catch when you create filters later on. Also make sure the Subject Header Modification is turned off for each bucket you’ve created. The header modification changes the subject header, which you’re probably not going to want changed. Obviously you can change the colors to whatever you want, as it’s not really necessary either way. It just makes it easier to spot things in the web browser if you color coordinate your buckets. Click Apply. Oh, so pretty.

b) Thunderbird Server Settings
The first part of configuring Thunderbird is to change the server settings from your account menu. Go to Edit - Account Settings - Server Settings. This is where you configure POPFile to sit in between your e-mail client and your ISP’s POP server. You’ll need to take note of the Server Name field, which is your ISP’s POP server, replace it with 127.0.0.1, and then manipulate the username field as well. You’ll be replacing the User Name field with the original Server Name field, plus adding your username to the end. Notice there’s a colon between the POP server and the username you use to connect to your ISP with. Click OK, you will be warned about changing your e-mail address. Just go back and change it to what it was originally.

c) Thunderbird Filters
The last configuration settings will now be setting up the filters to sort your newly classified mail that POPFile has checked for you. In case you can’t figure it out, POPFile scans your e-mail, adds the X-Text-Classification header to the scanned mail, which Thunderbird can easily sort into your folders…or in SPAM’s case, into the garbage can. Go to Tools - Message Filters - New. You’ll need to create the X-Text-Classification filter custom. Type in a filter name of Spam and choose the Customization setting. Type in the new customization header and click add. Now make sure the conditions match below, and it’s important that spam is spelled the exact same way it appears in your bucket on the web interface. After that, you can perform whatever action you want with the message. I create a folder under my Inbox called Spam and move all marked SPAM messages to this folder. Just to be safe, I set a retention policy by right clicking the folder created under my Inbox, and setting this to only delete messages older than a week. This gives me the chance to make sure the messages are indeed SPAM before they are canned.


4) POPFile Classifications
a) How it works
By now you should get the hint as to how POPFile works. At first, it will need a bit of training before it will start marking your messages as either spam or ham. When you first start filtering your e-mail, everything will be considered unclassified. You’ll have to physically mark your messages as either spam or ham for just a little while so that POPFile can learn what’s what. In the beginning, you’re going to get a lot of False Positives, False Negatives, and Classification errors. Don’t worry this is normal, because like I said, you have to teach POPFile what is good mail and what’s bad mail. Check out the stats on the Buckets tab. It only took me around 100 messages for it to start automatically marking things the way I expected. When that happened, the filter I have set up in Thunderbird started working by ditching marked SPAM messages into my Spam folder for inspection before deletion after my retention policy. I know that sounds great, huh, but how do you do that. Easy!
b) Changing classifications
Head back over to the History tab on the web interface. All mail coming through will be displayed here. By default, POPFile keeps mail history for 2 days. You can set this to be longer if you’d like in the Configurations tab. As mentioned, all incoming e-mail will be unclassfied for awhile. You need to mark the mail from the drop down menu box on the right hand side for every message you get for awhile. Click Reclassify at the bottom, then Remove All when you’re finished. This keeps the History cleaned out. It only took me about a day of training and now POPFile is catching things pretty well.
5) POPFile’s Other Options
a) Magnets
Magnets are neat if you want to always want to classify mail into a specific bucket regardless of anything. For example, you can set up a specified To: Cc: From: or Subject: with values and always classify them into a bucket. For instance if your friends are always forwarding you stupid e-mails that might look like SPAM. You can consider this the same as a filter in Thunderbird. This is mainly a sorting option.
b) Configuration
There are other configuration options to choose from under this tab. There are skins available for the user interface, history options for displaying history items, and logging options if you want to dump logs to the web interface instead of to the file you configured earlier.
c) Security
Yeah, security settings…big surprise. Check it out.
d) Advanced
The thing worth mentioning in here is the fact that you can add custom ignored words to the already rather large list.
6) After thought
Following this guide provides a nice solution to Thunderbird’s built in Junk Mail settings. With this setup, POPFile doesn’t scale to large installations well, and is meant more for the single user environment. There are multiple user options available, but you’ll have to check the docs out. If you’re looking for corporate level software to do huge mail server SPAM scanner, I would highly recommend other packages such as DSPAM. Enjoy! If anything needs fixed please let me know.





