By now we have a pretty effective setup - a NLB cluster running Squid protected by redundant load balancers, with a basic file replication scheme in place. It's about time we got down to the reason for building the cluster - filtering web proxy requests! In this post we will install and configure SquidGuard for URL filtering as well as doing a bit of environment prep for automation and web administation.
SquidGuard is a helper module for Squid that intercepts each access request and filters it based on criteria such as IP, username or hostname. It can filter domains, URLs and match URLs by regular expression (e.g. block all URLs ending in ".exe"). Unlike other filter services such as DansGuardian, it does not run as a standalone service that forwards requests to Squid once processed but is instead loaded by Squid itself. Note that SquidGuard does not filter page content - just URLs. SquidGuard is ideal for use with community maintained URL blacklists such as those from Shallalist.de.
Before continuing, I recommend reviewing the documentation on the SquidGuard website. The documentation is succinct and has a number of examples that should help you write your own config files quickly and easily.
Installation
The first step is to install SquidGuard on each NLB cluster node AND both directors. SquidGuard needs to be installed on both directors - not to run as a daemon, but to use the binary to compile blacklists to replicate to the cache servers. There is a SquidGuard package in Ubuntu, but it doesn't support LDAP so I chose to compile it from source. I used this feature in my own setup as I also configured Active Directory authentication in Squid (I'll cover this in a future post) so I used LDAP for group membership-based URL filtering. Compiling SquidGuard from source is very easy so I recommend you do it - simply run the following on each server:
root@lvs-cache1:~# apt-get install --yes libldap2-dev libdb4.8-dev gcc binutils make bison flex libmysqlclient-dev curl unzip root@lvs-cache1:~# cd root@lvs-cache1:~# wget http://www.squidguard.org/Downloads/squidGuard-1.4.tar.gz root@lvs-cache1:~# tar -zxf squidGuard-1.4.tar.gz root@lvs-cache1:~# cd squidGuard-1.4 root@lvs-cache1:~# ./configure --with-ldap --with-squiduser=proxy root@lvs-cache1:~# make root@lvs-cache1:~# make install
Environment Configuration - LVS-Director1 ONLY
There are a few more initial setup tasks to perform that will enable us to modify config files on the director and roll them out to all backend cache boxes. Run these commands - later we will reconfigure csync2 to replicate SquidGuard's configuration files. Create subdirectories inside SquidGuard's db/BL directory for a custom blacklist and a custom whitelist, each containing a domains and a urls file. The blacklist folder includes an additional file for regular expression matching. The BL directory is required as the blacklist archive from shallalist.de creates to a directory by this name:
root@lvs-director1:~# mkdir -p /usr/local/squidGuard/db/BL/custom_black root@lvs-director1:~# mkdir -p /usr/local/squidGuard/db/BL/custom_white root@lvs-director1:~# touch /usr/local/squidGuard/db/BL/custom_white/domains root@lvs-director1:~# touch /usr/local/squidGuard/db/BL/custom_white/urls root@lvs-director1:~# touch /usr/local/squidGuard/db/BL/custom_black/domains root@lvs-director1:~# touch /usr/local/squidGuard/db/BL/custom_black/urls root@lvs-director1:~# touch /usr/local/squidGuard/db/BL/custom_black/mimetypes
Finally grant the Squid user (proxy) full access to the SquidGuard directory. This is important as SquidGuard is very particular about file permissions and can easily launch into "emergency mode" if something isn't right. No URLs are filtered in emergency mode so you must pay careful attention to this.
root@lvs-director1:~# chown -R proxy:proxy /usr/local/squidGuard
SquidGuard Configuration
Now we're at the stage where we can look at how SquidGuard works and put together a basic configuration. Download the basic config file and save to /usr/local/squidGuard/squidGuard.conf (first delete or rename the original). First a few basic configuration directives:
dbhome /usr/local/squidGuard/db/BL logdir /usr/local/squidGuard/log
These settings are fine if you have compiled SquidGuard from source. If you installed the Ubuntu package, you will need to substitute the correct paths in both the SquidGuard config and the configuration of the web admin script in the next post (this guide assumes you have done the former and compiled from source).
There are three sections to squidGuard.conf. The first section defines client sources. That is, IP addresses or hostnames clients may connect from, usernames that may use the service etc. The second part defines destination categories - these are stored as databases of plain text black/whitelists containing domains/urls/regular expressions to match. The final section, ACLs, brings together the first two in defining which sources are allowed/prohibited to particular destination categories.
For example, I could define a source as IP addresses on my LAN:
src unfiltered { ip 192.168.0.1 } src lanclients { ip 192.168.0.0/24 }
Note that I have singled out one IP address in the "unfiltered" stanza. Even though the IP is also included in the "lanclients" stanza, it will only be registered against the first successful match. Therefore the order of your sources is very important.
We now need to define destination sites. Notice how domain lists, url lists and expression lists are specified with the "domainlist", "urllist" and "expressionlist" directives set to a relative path from dbhome set above. You can log accesses to a destination category, although I'd only enable it for troubleshooting. Here I have defined two categories to begin with - one whitelist and one blacklist:
dest custom_whitelist { domainlist custom_white/domains urllist custom_white/urls } dest custom_blacklist { domainlist custom_black/domains urllist custom_black/urls expressionlist custom_black/mimetypes #log blockedaccesses }
Finally configure an access control list (ACL) to control access to these categories:
acl { unfiltered { pass all redirect http://10.2.0.10/blocked/blocked.php?a=%a&n=%n&i=%i&s=%s&t=%t&u=%u } lanclients { pass custom_whitelist !custom_blacklist all redirect http://10.2.0.10/blocked/blocked.php?a=%a&n=%n&i=%i&s=%s&t=%t&u=%u } default { pass none redirect http://10.2.0.10/blocked/blocked.php?a=%a&n=%n&i=%i&s=%s&t=%t&u=%u } }
Each ACL stanza mirrors the name of a client source. Note the order in which categories are specified in the "pass" directives. If a category is prefixed with an exclamation mark then it is designated to be blocked. The order in which the categories are listed is important. Take the following line:
pass !custom_blacklist custom_whitelist !porn all
If I have a domain listed in custom_whitelist AND custom_blacklist, the domain will be blocked because custom_blacklist is listed first. Therefore you must always place whitelist categories before blacklists. End the line with "all" to allow any domains not explicitly blocked, as seen in the ACL section above.
The "redirect" directive refers to the page that should be delivered to the client when a URL is blocked. This can be a basic HTML page but it is better to have a dynamically generated page detailing the domain that has been blocked and by which ACL - it helps a lot when troubleshooting! The page has to be served from a webserver so we will need to install Apache on both directors and serve it from there. This will be covered in the next post, as we also need Apache and PHP in order to use my web admin tool. I have set the redirect URL to be the CACHE Cluster IP so the page will be served from the active director.
Finally, the default ACL must be included as a catch-all for requests that cannot be matched. In most cases you will want to have it block all requests, or at most allow only a very restricted set of domains.
The complete config should look like this:
############################## # CONFIG FILE FOR SQUIDGUARD # ############################## dbhome /usr/local/squidGuard/db/BL logdir /usr/local/squidGuard/log # # SOURCE USERS/COMPUTERS: # # **** AFTER A USER/COMPUTER MATCHES A GROUP, IT # WILL NOT ATTEMPT TO MATCH ANY FURTHER GROUPS **** # src unfiltered { ip 192.168.0.1 } src lanclients { ip 192.168.0.0/24 } # # DESTINATION CLASSES: # # **** AFTER A DOMAIN/URL MATCHES A GROUP APPLICABLE TO # THE SRC GROUP, IT WILL NOT BE MATCHED TO ANY FURTHER # GROUPS SO ALWAYS PLACE WHITELISTS BEFORE BLACKLISTS **** # # *** BEGIN CUSTOM LISTS *** dest custom_whitelist { domainlist custom_white/domains urllist custom_white/urls } dest custom_blacklist { domainlist custom_black/domains urllist custom_black/urls expressionlist custom_black/mimetypes #log blockedaccesses } # # FILTERING ACLS: # # **** THESE CAN BE IN ANY ORDER AS MATCHING # IS PERFORMED IN THE SRC/DEST STANZAS **** # acl { unfiltered { pass all redirect http://10.2.0.10/blocked/blocked.php?a=%a&n=%n&i=%i&s=%s&t=%t&u=%u } lanclients { pass custom_whitelist !custom_blacklist all redirect http://10.2.0.10/blocked/blocked.php?a=%a&n=%n&i=%i&s=%s&t=%t&u=%u } default { pass none redirect http://10.2.0.10/blocked/blocked.php?a=%a&n=%n&i=%i&s=%s&t=%t&u=%u } }
Test SquidGuard Configuration
Before we configure Squid to load SquidGuard and replicate all of SquidGuard's config files and blacklist databases, we must first test the config is working as it should. Perform the following steps on LVS-Director1:
- Add a test domain to /usr/local/squidGuard/db/BL/custom_black/domains e.g. "google.com" using a text editor. Note that you do not need to include the "http://" or "www" - specifying google.com will block all subdomains too (www.google.com, images.google.com etc).
- Compile the blacklist databases. This will create new database files for every domainlist and urllist mentioned in squidGuard.conf. Because this command has been run by root, you must reset the permissions on the files before running SquidGuard:
root@lvs-director1:~# squidGuard -C all root@lvs-director1:~# chown -R proxy:proxy /usr/local/squidGuard/db/*
- Run the following command to simulate Squid sending a request to SquidGuard. Be sure to set the client IP to one that you've configured as a valid source (it doesn't have to be the IP of the host you're testing on - it is just to simulate a client request):
root@lvs-director1:~# echo "http://www.example.com 192.168.0.100/ - - GET" | squidGuard -c /usr/local/squidGuard/squidGuard.conf -d
If the URL is allowed, you will not get any errors. If the URL has been blocked (i.e. the one that you've entered into the custom_blacklist/domains file, you should get something like this:
root@lvs-director1:~# echo "http://www.google.com 192.168.0.100/ - - GET" | squidGuard -c /usr/local/squidGuard/squidGuard.conf -d 2007-03-25 16:18:05 [30042] squidGuard ready for requests (1174832285.085) http://10.2.0.10/blocked/blocked.php?a=192.168.0.100&n=&i=&s=lanclients&t=custom_blacklist&u=http://www.google.com 192.168.0.100/- - - 2007-03-25 16:18:05 [30042] squidGuard stopped (1174832285.089)
Configure Squid
Squid configuration is very easy - just one single line to put at the very top of the file! Add this line to /etc/squid3/squid.conf on LVS-Director1:
url_rewrite_program /usr/local/bin/squidGuard -c /usr/local/squidGuard/squidGuard.conf
Download Community Supported Blacklists
Head over to http://www.shallalist.de/ and read up on the (free) service they provide and make sure you understand the terms and conditions. To the right there is a download link which points directly to the most recent blacklists archive. Download it to your SquidGuard directory and unarchive it:
root@lvs-director1:~# cd /usr/local/squidGuard/db root@lvs-director1:~# wget http://www.shallalist.de/Downloads/shallalist.tar.gz root@lvs-director1:~# tar xzf shallalist.tar.gz
Look at the contents of the directory /usr/local/squidGuard/db/BL. You'll notice there are a stack of directories each with their own domains and urls files. Each of them will have to be defined as a destination category in SquidGuard's config file. Download and rename the modified squidGuard.conf to /usr/local/squidGuard/squidGuard.conf. The file has all categories configured to be blocked for all users except for the unfiltered IP. Note that blocking all categories is VERY excessive and generally not recommended - it's just quicker to edit a list than to create one yourself.
Once you have the new file in place with all blacklists properly configured, recompile the blacklists into databases (enables SquidGuard to process them quicker at run time). Again, as this command has been run as root (in this instance - it doesn't have to be), reset the permissions on the directory and then go through the testing procedure as outlined above. The compilation should take between 30-90 seconds for the Shallalist blacklists as they are quite large. If it takes longer, tail /usr/local/squidGuard/log/squidGuard.log and look for errors - they are usually quite descriptive:
root@lvs-director1:~# squidGuard -C all root@lvs-director1:~# chown -R proxy:proxy /usr/local/squidGuard/db/* root@lvs-director1:~# chmod -R g+w /usr/local/squidGuard/db/*
Obviously this manual method is no way to keep your blacklists up to date... enter part 7 - automation and administration