To place your cache in a hierarchy, use the cache_host
directive in squid.conf to specify the parent and sibling
nodes.
For example, the following squid.conf file on
childcache.example.com
configures its cache to retrieve
data from one parent cache and two sibling caches:
# squid.conf - On the host: childcache.example.com
#
# Format is: hostname type http_port udp_port
#
cache_host parentcache.example.com parent 3128 3130
cache_host childcache2.example.com sibling 3128 3130
cache_host childcache3.example.com sibling 3128 3130
The cache_host_domain
directive allows you to specify that
certain caches siblings or parents for certain domains:
# squid.conf - On the host: sv.cache.nlanr.net
#
# Format is: hostname type http_port udp_port
#
cache_host electraglide.geog.unsw.edu.au parent 3128 3130
cache_host cache1.nzgate.net.nz parent 3128 3130
cache_host pb.cache.nlanr.net parent 3128 3130
cache_host it.cache.nlanr.net parent 3128 3130
cache_host sd.cache.nlanr.net parent 3128 3130
cache_host uc.cache.nlanr.net sibling 3128 3130
cache_host bo.cache.nlanr.net sibling 3128 3130
cache_host_domain electraglide.geog.unsw.edu.au .au
cache_host_domain cache1.nzgate.net.nz .au .aq .fj .nz
cache_host_domain pb.cache.nlanr.net .uk .de .fr .no .se .it
cache_host_domain it.cache.nlanr.net .uk .de .fr .no .se .it
cache_host_domain sd.cache.nlanr.net .mx .za .mu .zm
The configuration above indicates that the cache will use
pb.cache.nlanr.net
and it.cache.nlanr.net
for domains uk, de, fr, no, se and it, sd.cache.nlanr.net
for domains mx, za, mu and zm, and cache1.nzgate.net.nz
for domains au, aq, fj, and nz.
We have a simple set of guidelines for joining the NLANR cache hierarchy.
The NLANR hierarchy can provide you with an initial source for parent or sibling caches. Joining the NLANR global cache system will frequently improve the performance of your caching service.
Just enable these options in your squid.conf and you'll be
registered:
cache_announce 24
announce_to sd.cache.nlanr.net:3131
NOTE: announcing your cache is not the same thing as joining the NLANR cache hierarchy. You can join the NLANR cache hierarchy without registering, and you can register without joining the NLANR cache hierarchy.
Visit the NLANR cache registration database to discover other caches near you. Keep in mind that just because a cache is registered in the database does not mean they are willing to be your parent/sibling/child. But it can't hurt to ask...
Occasionally people have trouble understanding accelerators and proxy caches, usually resulting from mixed up interpretations of "incoming" and ``outgoing" data. I think in terms of requests (i.e., an outgoing request is from the local site out to the big bad Internet) The data received in reply is incoming, of course. Others think in the opposite sense of ``a request for incoming data".
An accelerator caches incoming requests for outgoing data (i.e., that which you publish to the world). It takes load away from your HTTP server and internal network. You move the server away from port 80 (or whatever your published port is), and substitute the accelerator, which then pulls the HTTP data from the ``real" HTTP server (only the accelerator needs to know where the real server is). The outside world sees no difference (apart from an increase in speed, with luck).
Quite apart from taking the load of a site's normal web server, accelerators can also sit outside firewalls or other network bottlenecks and talk to HTTP servers inside, reducing traffic across the bottleneck and simplifying the configuration. Two or more accelerators communicating via ICP can increase the speed and resilience of a web service to any single failure.
The Squid redirector can make one accelerator act as a single front-end for multiple servers. If you need to move parts of your filesystem from one server to another, or if separately administered HTTP servers should logically appear under a single URL hierarchy, the accelerator makes the right thing happen.
If you wish only to cache the ``rest of the world" to improve local users browsing performance, then accelerator mode is irrelevant. Sites which own and publish a URL hierarchy use an accelerator to improve other sites' access to it. Sites wishing to improve their local users' access to other sites' URLs use proxy caches. Many sites, like us, do both and hence run both.
Measurement of the Squid cache and its Harvest counterpart suggest an order of magnitude performance improvement over CERN or other widely available caching software. This order of magnitude performance improvement on hits suggests that the cache can serve as an httpd accelerator, a cache configured to act as a site's primary httpd server (on port 80), forwarding references that miss to the site's real httpd (on port 81).
In such a configuration, the web administrator renames all non-cacheable URLs to the httpd's port (81). The cache serves references to cacheable objects, such as HTML pages and GIFs, and the true httpd (on port 81) serves references to non-cacheable objects, such as queries and cgi-bin programs. If a site's usage characteristics tend toward cacheable objects, this configuration can dramatically reduce the site's web workload.
Note that it is best not to run a single squid process as
both an httpd-accelerator and a proxy cache, since these two modes
will have different working sets. You will get better performance
by running two separate caches on separate machines. However, for
compatability with how administrators are accustomed to running
other servers that provide both proxy and Web serving capability
(eg, CERN), the Squid supports operation as both a proxy and
an accelerator if you set the httpd_accel_with_proxy
variable to on
inside your squid.conf
configuration file.
If you are behind a firewall then you can't make direct connections to the outside world, so you must use a parent cache. Squid doesn't use ICP queries for a request if it's behind a firewall or if there is only one parent.
You can use the inside_firewall
directive in
squid.conf to specify a list of domains internal to your
Internet firewall. For example:
inside_firewall example.com
You can also specify multiple domains:
inside_firewall example.com example.org example.net
The use of inside_firewall
affects the server selection
algorithm in two ways. Objects not matching any of the listed
domains will be considered beyond the firewall. For these:
As a special case you may specify the domain as none
to
force all requests to be fetched from siblings and parents.
The dnsserver processes are used by squid because the gethostbyname(3)
library routines used to
convert web sites names to their internet addresses
blocks until the function returns (i.e., the process that calls
it has to wait for a reply). Since there is only one squid
process, everyone who uses the cache would have to wait each
time the routine was called. This is why the dnsserver is
a separate process, so that these processes can block,
without causing blocking in squid.
It's very important that there are enough dnsserver processes to cope with every access you will need, otherwise squid will stop occasionally. A good rule of thumb is to make sure you have at least the maximum number of dnsservers squid has ever needed on your system, and probably add two to be on the safe side. In other words, if you have only ever seen at most three dnsserver processes in use, make at least five. Remember that a dnsserver is small and, if unused, will be swapped out.
We would like to use Squid, but we need it to use socks to connect to the world outside our firewall.
No changes are necessary to use Squid with socks5.
Simply add the usual -Dbind=SOCKSbind
etc., to the compile line and
-lsocks
to the link line.
--- Carson Gaspar (carson@cugc.org)
Kolics Bertold has made an excellent flow chart diagram showing this process.
Before you run the configure script, simply set the CACHE_HTTP_PORT
environment variable.
setenv CACHE_HTTP_PORT 8080
./configure
make
make install
With Squid-1.1 it is NOT possible. Each cache_dir is assumed to be the same size. The cache_swap setting defines the size of all cache_dir's taken together. If you have N cache_dir's then each one will hold cache_swap ÷ N Megabytes.
Several people on both the fwtk-users and the squid-users mailing asked about using Squid in combination with http-gw from the TIS toolkit. The most elegant way in my opinion is to run an internal Squid caching proxyserver which handles client requests and let this server forward it's requests to the http-gw running on the firewall. Cache hits won't need to be handled by the firewall.
In this example Squid runs on the same server as the http-gw, Squid uses 8000 and http-gw uses 8080 (web). The local domain is home.nl.
Either run http-gw as a daemon from the /etc/rc.d/rc.local (Linux
Slackware):
exec /usr/local/fwtk/http-gw -daemon 8080
or run it from inetd like this:
web stream tcp nowait.100 root /usr/local/fwtk/http-gw http-gw
I increased the watermark to 100 because a lot of people run into
problems with the default value.
Make sure you have at least the following line in
/usr/local/etc/netperm-table:
http-gw: hosts 127.0.0.1
You could add the IP-address of your own workstation to this rule and
make sure the http-gw by itself workstest, like:
http-gw: hosts 127.0.0.1 10.0.0.1
The following settings are important:
http_port 8000
icp_port 0
cache_host localhost.home.nl parent 8080 0 default
inside_firewall home.nl
This tells Squid to use the parent for all domains other than home.nl.
Below, access.log entries show what happens if you do a reload on the
Squid-homepage:
872739961.631 1566 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://squid.nlanr.net/ - DEFAULT_PARENT/localhost.home.nl -
872739962.976 1266 10.0.0.21 TCP_CLIENT_REFRESH/304 88 GET http://www.nlanr.net/Images/cache_now.gif - DEFAULT_PARENT/localhost.home.nl -
872739963.007 1299 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://squid.nlanr.net/Squid/squidnow.gif - DEFAULT_PARENT/localhost.home.nl -
872739963.061 1354 10.0.0.21 TCP_CLIENT_REFRESH/304 83 GET http://squid.nlanr.net/Squid/Squidlogo2.gif - DEFAULT_PARENT/localhost.home.nl
http-gw entries in syslog:
Aug 28 02:46:00 memo http-gw[2052]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:00 memo http-gw[2052]: log host=localhost/127.0.0.1 protocol=HTTP cmd=dir dest=squid.nlanr.net path=/
Aug 28 02:46:01 memo http-gw[2052]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
Aug 28 02:46:01 memo http-gw[2053]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2053]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=squid.nlanr.net path=/Squid/Squidlogo2.gif
Aug 28 02:46:01 memo http-gw[2054]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2054]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=squid.nlanr.net path=/Squid/squidnow.gif
Aug 28 02:46:01 memo http-gw[2055]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta)
Aug 28 02:46:01 memo http-gw[2055]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.nlanr.net path=/Images/cache_now.gif
Aug 28 02:46:02 memo http-gw[2055]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1
Aug 28 02:46:03 memo http-gw[2053]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=2
Aug 28 02:46:04 memo http-gw[2054]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=3
To summarize:
Advantages:
Disadvantages:
-- Rodney van den Oever
When a proxy-cache is used, a server does not see the connection
coming from the originating client. Many people like to implement
access controls based on the client address.
To accomodate these people, Squid adds its own request header
called "X-Forwarded-For" which looks like this:
X-Forwarded-For: 128.138.243.150, unknown, 192.52.106.30
Entries are always IP addresses, or the word unknown if the address
could not be determined or if it has been disabled with the
forwarded_for configuration option.
We must note that access controls based on this header are extremely weak and simple to fake. Anyone may hand-enter a request with any IP address whatsoever. This is perhaps the reason why client IP addresses have been omitted from the HTTP/1.1 specification.
Normally, the redirector feature is used to rewrite requested URLs. Squid then transparently requests the new URL. However, in some situations, it may be desirable to return an HTTP "301" or "302" redirect message to the client. This is now possible with Squid version 1.1.19.
Simply modify your redirector program to append either "301:" or "302:"
before the new URL. For example, the following script might be used
to direct external clients to a secure Web server for internal documents:
#!/usr/local/bin/perl
$|=1;
while (<>) {
@X = split;
$url = $X[0];
if ($url =~ /^http:\/\/internal\.foo\.com/) {
$url =~ s/^http/https/;
$url =~ s/internal/secure/;
print "302:$url\n";
} else {
print "$url\n";
}
}
Please see sections 10.3.2 and 10.3.3 of RFC 2068 for an explanation of the 301 and 302 HTTP reply codes.