-
Notifications
You must be signed in to change notification settings - Fork 4
/
README
340 lines (259 loc) · 14 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
History
=======
This util was previously at:
http://www.wa.apana.org.au/~dean/squidpurge/
but the site was taken offline and appears that the code was abandoned. I
managed to grab the sources via archive.org and change them to compile. The
resulting binary works for me on centos 6 x86_64. YMMV.
purge
=====
The purge tool is a kind of magnifying glass into your squid-2 cache. You
can use purge to have a look at what URLs are stored in which file within
your cache. The purge tool can also be used to release objects which URLs
match user specified regular expressions. A more troublesome feature is the
ability to remove files squid does not seem to know about any longer.
USE AT YOUR OWN RISK! NO GUARANTEES, WHATSOEVER! DON'T BLAME US!
YOU HAVE BEEN WARNED!
compilation
===========
Purge has been successfully compiled under the following OSes:
SYSTEM g++ native
------ --- ------
Solaris 2.7 yes CC
IRIX 6.5 yes CC -n32
Linux 2.0.36 yes (g++ IS native)
FreeBSD 4.x yes gmake port must be installed
(g++ IS supported)
The recent move of the Linux community towards glibc2 may cause some
troubles, though. The compilation requires GNU make, no other make will work
correctly. The source distribution contains all files checked into the
revision control repository. Therefore, you will need to install GNU RCS
first (which in turn needs the GNU diffutils).
The repository also contains the prototypical Perl implementation. The user
interface in the C++ implementation changed a little when compared to the
Perl one. You will have to state at least one regular expression for purge
to start working. Also, printing the complete cache URLs, you will need to
specify the "-e ." regular expression.
In order to compile the purge tool, untar the source distribution and
change into the purge directory. With RCS and GNU make installed, just say
"make". GNU make will automagically retrieve all necessary files from the
repository and create the binary.
Systems not stated above will need to retrieve the makefile (use "co -l
Makefile" for this) and add their own platform specific definitions to
section [2] in the makefile.
squid preparation
=================
In order to use purge for real PURGEs, you will have to enable this feature
in squid. By default, PURGE is disabled. You should watch closely for whom
you enable the PURGE ability, otherwise total stranger just might wipe your
cache content. The following lines will need to be added to your squid.conf
(you may want to add further networks to the src_local ACL):
acl purge method PURGE
acl src_local src 127.0.0.0/8
http_access allow purge src_local
http_access deny purge
Reconfigure or restart (preferred) your squid after changing the
configuration file.
modes of operation
==================
$Id: purge.cc,v 1.15 2000/09/21 09:05:56 cached Exp $
Usage: purge [-a] [-c cf] [-d l] [-(f|F) fn | -(e|E) re] [-p h[:p]]
[-P #] [-s] [-v] [-C dir [-H]] [-n]
-a display a little rotating thingy to indicate that I am alive (tty only).
-c c squid.conf location, default "/usr/local/squid/etc/squid.conf".
-C dir base directory for content extraction (copy-out mode).
-d l debug level, an OR of different debug options.
-e re single regular expression per -e instance (use quotes!).
-E re single case sensitive regular expression like -e.
-f fn name of textfile containing one regular expression per line.
-F fn name of textfile like -f containing case sensitive REs.
-H prepend HTTP reply header to destination files in copy-out mode.
-n do not fork() when using more than one cache_dir.
-p h:p cache runs on host h and optional port p, default is localhost:3128.
-P # if 0, just print matches; otherwise OR the following purge modes:
0x01 really send PURGE to the cache.
0x02 remove all caches files reported as 404 (not found).
0x04 remove all weird (inaccessible or too small) cache files.
0 and 1 are recommended - slow rebuild your cache with other modes.
-s show all options after option parsing, but before really starting.
-v show more information about the file, e.g. MD5, timestamps and flags.
--- &< snip, snip ---
-a is a kind of "i am alive" flag. It can only be activated, if
your stdout is a tty. If active, it will display a little
rotating line to indicate that there is actually something
happening. You should not use this switch, if you capture
your stdout in a file, or if your expression list produces
many matches. The -a flag is also incompatible with the
(default) multi cache_dir mode.
default: off
See also: -n
-c cd CHANGED!
this option lets you specify the location of the squid.conf file.
Purge now understands about more than one cache_dir, and does so
by parsing Squid's configuration file. It knows about both ways
of Squid-2 cache_dir specifications, and will automatically try
to use the correct one.
default: /usr/local/squid/etc/squid.conf
-C cd if you want to rescue files from your cache, you need to specify
the directory into which the files will be copied. Please note
that purge will try to establish the original server's directory
structure. This switch also activates copy-out mode. Please do
not use copy-out mode with any purge mode (-P) other than 0.
For instance, if you specified "-C /tmp", Purge will try to
recreate /tmp/www.server.1/url/path/file, and so forth.
default: off
See also: -H, -P
-d l lets you specify a debug level. Differents bits are reserved for
different output.
default: 0
-e re the "-e" options let you specify one regular expression at the
-E re commandline. This is useful, if there is only a handful you
want to check. Please remember to escape the shell metachars
used in your regular expression. The use of single quotes
around your expression is recommended. The capital letter
version works case sensitive, the lower caps version does not.
default: (no default)
-f fn if you have more than a handful of expression, or want to check
-F fn the same set at regular intervals, the file option might be more
useful to you. Each line in the text file will be regarded as
one regular expression. Again, the capital letter version works
case sensitive, the lower caps version does not.
default: (no default)
-H if in copy-out mode (see: -C), you can specify to keep the
HTTP Header in the recreated file.
default: off
See also: -C
-n by specifying the "-n" switch, you will tell Purge to process
one cache_dir after another, instead of doing things in parallel.
If you have more than one cache_dir in your configuration,
Purge will fork off a worker process for each cache_dir to
do the checks for optimum speed - assuming a decently designed
cache. Since parallel execution will put quite some load on the
system and its controllers, it is sometimes preferred to use
less resources, though it will take longer.
default: parallel mode for more than one cache_dir
-p h[:p] Some cache admins (i.e. me) use a different port than 3128. The
purge tool will need to connect to your cache in order to send
the PURGE request (see -P). This option lets you specify the
host and port to connect to. The port is optional. The port
can be a name (check your /etc/services) or number. It is
separated from the host name portion by a single colon, no
spaces allowed.
default: localhost:3128
-P # If you want to do more than just print your cache content, you
will need to specify this option. Each bit is reserved for a
different action. Only the use of the LSB is recommended, the
rest should be considered experimental.
no bit set: just print
bit#0 set: send PURGE for matches
bit#1 set: unlink object file for 404 not found PURGEs
bit#2 set: unlink weird object files
If you use a value other than 0 or 1, you will need to slow
rebuild your cache content. A warning message will remind you
of that. If you use bit#1, all unsuccessful PURGEs will result
in the object file in your cache directory to be removed, because
squid does not seem to know about it any longer. Beware that the
asyncio might try to remove it after the purge tool, and thus
complains bitterly. Bit#1 only makes sense, if Bit#0 is also
set, otherwise it has no effect (since the HTTP status 404 is
never returned).
Bit#2 is reserved for strange files which do not even contain
a URL. Beware that these files may indicate a new object squid
currently intends to swap onto disk. If the file suddenly went
away, or is removed when squid tries to fetch the object, it
will complain bitterly. You must slow rebuild your cache, if
you use this option.
It is recommended that if you dare to use bit#1 or bit#2, you
should only grant the purge tool access to your squid, e.g.
move the HTTP and ICP listening port of squid to a different
non-standard location during the purge.
default: 0 (just print)
-s If you specify this switch, all commandline parameters will be
shown after they were parsed.
default: off
-v be verbose in the things reported about the file. See the output
section below.
output
======
In regular mode, the output of purge consists of four columns. If the
URL contains not encoded whitespaces, it may look as if there are more
columns, but the last one is the URI.
# name meaning
- ------ -----------------------------------------------------------
1 file name of cache file eximed which matches the re.
2 status return result of purge request, " 0" in print mode.
3 size object size including stored headers, not file size.
4 uri perceived uri
Example for non-verbose output in print-mode:
/cache3/00/00/0000004A 0 5682 http://graphics.userfriendly.org/images/slovenia.gif
In verbose mode, additional columns are inserted before the uri. Time
stamps are reported using hexadecimal notation, and Squid's standard
for reporting "no such timestamp" == -1, and "unparsable timestamp" == -2.
# name meaning
- ------ -----------------------------------------------------------
1 file name of cache file eximed which matches the re.
2 status return result of purge request, " 0" in print mode "-P 0".
3 size object size including stored headers, not file size.
4 md5 MD5 of URI from file, or "(no_md5_data_available)" string.
5 ts UTC of Value of Date: header in hex notation
6 lr UTC of last time the object was referenced
7 ex UTC of Expires: header
8 lr UTC of Last-Modified: header
9 flags Value of objects flags field in hex, see: Programmers Guide
10 refcnt number of times the object was referenced.
11 uri STORE_META_URL uri or "strange_file"
Example for verbose output in print-mode:
/cache1/00/00/000000B7 0 406 7CFCB1D319F158ADC9CFD991BB8F6DCE 397d449b 39bf677b ffffffff 3820abfc 0460 1 http://www.netscape.com/images/nc_vera_tile.gif
hexd
====
The hexd tool let's you conveniently hex dump a file both, in hex char and
display char columns. Hexd only assumes that characters 0-31,127-159,255
are not printable.
$ ./hexd /cache1/00/00/000000B7 | less -r
00000000: 03 00 00 00 6D 03 00 00-00 10 7C FC B1 D3 19 F1 ....m.....|ü±Ó.ñ
00000010: 58 AD C9 CF D9 91 BB 8F-6D CE 05 00 00 00 18 39 XÉÏÙ.».mÎ.....9
00000020: 7D 44 9B 39 BF 67 7B FF-FF FF FF 38 20 AB FC 00 }D.9¿g{....8 «ü.
00000030: 00 00 00 00 01 04 60 04-00 00 00 30 68 74 74 70 ......`....0http
00000040: 3A 2F 2F 77 77 77 2E 6E-65 74 73 63 61 70 65 2E ://www.netscape.
00000050: 63 6F 6D 2F 69 6D 61 67-65 73 2F 6E 63 5F 76 65 com/images/nc_ve
00000060: 72 61 5F 74 69 6C 65 2E-67 69 66 00 08 48 54 54 ra_tile.gif..HTT
00000070: 50 2F 31 2E 30 20 32 30-30 20 4F 4B 0D 0A 53 65 P/1.0 200 OK..Se
00000080: 72 76 65 72 3A 20 4E 65-74 73 63 61 70 65 2D 45 rver: Netscape-E
00000090: 6E 74 65 72 70 72 69 73-65 2F 33 2E 36 0D 0A 44 nterprise/3.6..D
000000A0: 61 74 65 3A 20 54 75 65-2C 20 32 35 20 4A 75 6C ate: Tue, 25 Jul
000000B0: 20 32 30 30 30 20 30 37-3A 34 31 3A 31 35 20 47 2000 07:41:15 G
000000C0: 4D 54 0D 0A 43 6F 6E 74-65 6E 74 2D 54 79 70 65 MT..Content-Type
000000D0: 3A 20 69 6D 61 67 65 2F-67 69 66 0D 0A 4C 61 73 : image/gif..Las
000000E0: 74 2D 4D 6F 64 69 66 69-65 64 3A 20 57 65 64 2C t-Modified: Wed,
000000F0: 20 30 33 20 4E 6F 76 20-31 39 39 39 20 32 31 3A 03 Nov 1999 21:
00000100: 34 31 3A 31 36 20 47 4D-54 0D 0A 43 6F 6E 74 65 41:16 GMT..Conte
00000110: 6E 74 2D 4C 65 6E 67 74-68 3A 20 36 37 0D 0A 41 nt-Length: 67..A
00000120: 63 63 65 70 74 2D 52 61-6E 67 65 73 3A 20 62 79 ccept-Ranges: by
00000130: 74 65 73 0D 0A 41 67 65-3A 20 31 38 32 37 31 33 tes..Age: 182713
00000140: 0D 0A 58 2D 43 61 63 68-65 3A 20 48 49 54 20 66 ..X-Cache: HIT f
00000150: 72 6F 6D 20 63 73 2D 68-61 6E 34 2E 77 69 6E 2D rom cs-han4.win-
00000160: 69 70 2E 64 66 6E 2E 64-65 0D 0A 58 2D 43 61 63 ip.dfn.de..X-Cac
00000170: 68 65 2D 4C 6F 6F 6B 75-70 3A 20 48 49 54 20 66 he-Lookup: HIT f
00000180: 72 6F 6D 20 63 73 2D 68-61 6E 34 2E 77 69 6E 2D rom cs-han4.win-
00000190: 69 70 2E 64 66 6E 2E 64-65 3A 38 30 38 31 0D 0A ip.dfn.de:8081..
000001A0: 50 72 6F 78 79 2D 43 6F-6E 6E 65 63 74 69 6F 6E Proxy-Connection
000001B0: 3A 20 6B 65 65 70 2D 61-6C 69 76 65 0D 0A 0D 0A : keep-alive....
000001C0: 47 49 46 38 39 61 01 00-26 00 A2 00 00 00 00 00 GIF89a..&.¢.....
000001D0: FF FF FF 00 33 66 33 66-99 FF FF FF 00 00 00 00 ....3f3f........
000001E0: 00 00 00 00 00 21 F9 04-01 00 00 04 00 2C 00 00 .....!ù......,..
000001F0: 00 00 01 00 26 00 00 03-08 38 A2 BC DE F0 C9 A8 ....&....8¢¼Þðɨ
00000200: 12 00 3B ..;
limitations
===========
o Purge does not slow rebuild the cache for you.
o It is still relatively slow, especially if your machine is low on memory
and/or unable to hold all OS directory cache entries in main memory.
o should never be used on "busy" caches with purge modes higher than 1.
TODO
====
1) use the stat() result on weird files to have a look at their ctime and
mtime. If they are younger than, lets say 30 seconds, they were just
created by squid and should not be removed.
2) Add a query before purging objects or removing files, and add another
option to remove nagging for the experienced user.
3) The reported object size may be off by one.