FTPWebLog Version: 1.0.2
Apache/Ftpd Graphic/Text Based Log analizer
The Project Home pages are No longer in existance
Mac OS X package built by Chris
Roberts
FTPWebLog 1.0.2 (Last update 2 July 1996)
What is FTPWebLog?
FTPWebLog 1.0.2 is a freeware integrated WWW and FTP log reporting tool. Its
primary inspiration was the wwwstat program
written by Roy Fielding.
While a good program -
wwwstat has some design flaws that
make it unsuited for use by large sites as released - notably difficult
reconfiguration of reports, bad handling of characters that
should be escaped, difficulty in making it support additional log
formats, poor support for multiple servers, and the rather 'after the fact'
retro-fitting of graphic reports to it.
My experience using and heavily customizing wwwstat led me to
conclude that I needed a new program written from the ground up for
flexibility: FTPWebLog was the result.
wwwstat still does some things that FTPWebLog does not - most notably
filtering of reports by date. On the flip side, FTPWebLog does several
things that wwwstat does not and is much easier to
customize to match a sites particular needs.
Differences between 1.0.2 and 1.0.1
I have added 'archive section' reporting to the main text report,
a CGI script to allow getting 'extract' reports on the fly and a hostname
lookup function that can convert raw IP addresses to hostnames as a log
is being scanned. The documentation on all changes from 1.0.1 is still
quite thin.
What does a FTPWebLog report look like?
OSXGNU has an Activeexample
of a report online. This report is a full report with all report
sections activated and graphs. The text section is about 230-500 Kbytes. Each
major section can be selectively disabled, and re-ordering the sections
is simply a matter of changing the order of a half dozen calling lines.
How much does it cost?
Absolutely nothing.
If you like it - just download it and set it up.
Setting it up
First - download
the current alpha distribution.
If you want to do graphical reports, you will also need some additional
support:
- The gd.pm
GIF perl library. It is available at <URL:http://www.osxgnu.org/downloads/GD-1.19X.pkg.sit.bin>
- The gd graphics library is now part of the Perl distribution.
- You also *MUST* have Perl 5.001 or
later to use the graph generating Perl script. Perl 4.036 is sufficient
for the text based report, but Perl 5.001 is necessary for the graphic
based report. Perl 5 is availaible at CPAN
and many other places it comes with MAC OSX
Follow the directions given with each of those packages to install them.
Once the required graphics support is in place, configuration of
'ftpweblog' is easy.
Configuration
Almost all the options are explained directly in the source for
'ftpweblog' and 'graphftpweblog'. Here is a short general guide that
should let you get up.
Identify where your access_log is
stored. Change $LogFile in the 'ftpweblog' program to point to it.
If using 'graphftpweblog', set $GraphFTPWebLogURL in the 'ftpweblog'
program to point the URL where you intend to put the graphic report html
file generated by 'graphftpweblog'.
Make any directories that will be used by 'graphftpweblog' to store the
gif files it generates.
Run 'ftpweblog' - directing its output to a file:
ftpweblog > stats.html
If using graphftpweblog, run it - also directing its output to a file.
graphftpweblog > graphs.html
You should now have a report. That easy. By fine tuning the report
options, you can make it as short or as in depth as you like.
The Command Line Options for FTPWebLog
Nearly every report option that can be set from inside the script can be
set using command line options:
ftpwwwlog [-h] [-i pathname] [-t www|ftp]
[-x perlregex] [-X perlregex] [-r perlregex]
[-R perlregex] [-A 0|1] [-H 0|1] [-f N]
[-d N] [-S 0|1] [-D 0|1] [-F 0|1]
[-N systemname] [-T perlregex] [-B perlregex]
[-Q quota] [-q quotarate]
[logfile ...] [logfile.gz ...] [logfile.Z ...]
Display Options
- -h
- Just display the usage help message and quit.
Input Options
- -i pathname
- Include the 'pathname' file (assumed to be a prior ftpweblog
output). in the report. Only one preexisting report can
be included per run right now.
- [logfile ...] [logfile.gz ...] [logfile.Z ...]
- Process the listed sequence of logfiles.
- -t www|ftp
- Select whether the log files are to be processed are
FTP log or NCSA Common Log format
- -g URL
- The URL of the of GraphFTPWebLog output html file(if using
GraphFTPWebLog)
Log Search Options
- -x regex
- Only include domain names matching the perl regex in the report
- -X regex
- Do not include any domain name matching the perl regex
- -r regex
- Only include refs to files matching the perl regex
- -R regex
- Do not include refs to files matching the perl regex
- -A 0|1
- Print Daily stats (0=do not, 1=do)
- -H 0|1
- Print Hourly stats (0=do not, 1=do)
- -f N
- Print Top N Files (0=do not)
- -d N
- Print Top N Domains (0=do not)
- -S 0|1
- Print summary report (0=do not, 1=do)
- -F 0|1
- Print full file listing (0=do not, 1=do)
- -D 0|1
- Print full domain listing (0=do not, 1=do)
- -L 0|1
- Print top level domain report (0=do not, 1=do)
- -N name
- Name for report
- -T regex
- Filter top N file list to exclude files matching the regex
- -B regex
- Blank this pattern in filenames. Useful for stripping extra
path from cache defeating CGI scripts.
- -Q quota
- Volume Quota in bytes (0=no quota). A extremely basic
accounting feature. Lets you automatically charge for excessive volume.
- -q quotarate
- Quota Rate in meg/day over volume quota. Assumed to be in
dollars.
The Command Line Options for GraphFTPWebLog
graphftpwwwlog [-h] [-A 0|1] [-B regex] [-D 0|1] [-d N]
[-f N] [-H 0|1] [-N name] [-P directory]
[-U URL] [-R regex] [-r regex] [-X regex]
[-x regex] [filename]
GraphFTPWebLog processes a FTPWebLog report and produce graphss of the
information in it. An HTML web page connecting them together is sent to
STDOUT
Display Options
-
- -h
- Just display the usage help message and quit.
Common Options
- -P directory
- Directory where the graph files are to be stored.
- -U URL
- Base URL where the graph files can be accessed
- -A 0|1
- Graph Daily stats (0=do not, 1=do)
- -B regex
- Blank out partial URLs matching the regex. This can be used to
'defragment' URLs that use extended paths (such as cache defeating
CGI programs).
- -D 0|1
- Graph top level doamins (0=do not, 1=do)
- -d N
- Graph Top N Domains (0=do not)
- -f N
- Graph Top N Files (0=do not)
- -H 0|1
- Graph Hourly stats (0=do not, 1=do)
- -N name
- System name for report. It iss inserted into the title and a
h1 header for the report.
- -R regex
- Filter out URLs matching regex from the top N files graph
- -r regex
- Include only files matching the perl regex
- -X regex
- Filter domains matching regex in top N domains graph
- -x regex
- Include only doamains matching the perl regex
- filename
- The file where an already generated
FTPWebLog report has been stored.
Putting all together
Here is an example of a script to analyze a log and generate both a full
report, and a 'lite' report - both linked to a graphic report.
#!/bin/bash
cd /usr/local/logs/bin/stats # Where I keep the FTPWebLog scripts
# Directory where I am going to keep all my stats
basestatsdir="/Library/WebServer/html/stats"
# Location of my access_log
sourcelog="/var/log/httpd/access_log"
# Name of my server
name="www.someplace.com"
# Type of log I am processing (www or ftp)
type="www"
#Name of the full stats report
statsfile="$basestatsdir/index.html"
# Genate a FULL stats report, all reports.
./ftpweblog -t "$type" -N "Web Log Report for $name" \
-d 40 -D 1 -L 1 -f 40 -F 1 -S 1 -A 1 -H 1 \
-g "/statistics/graph.html" \
$sourcelog > ${statsfile}.$$
mv ${statsfile}.$$ ${statsfile} # Doing the two step to keep the time
# when there are NO stats to a minimum
# Generate a stats lite
# Only the Summary, Daily, Hourly and Top Level domains.
litestatsfile="$basestatsdir/httpstats-lite.html"
./ftpweblog -t "$type" \
-N "Lite Web Log Report for $name" -i $statsfile \
-d 0 -D 0 -L 1 -F 0 -f 0 -S 1 -A 1 -H 1 \
-g "/statistics/graph.html" \
/dev/null > ${litestatsfile}.$$
mv ${litestatsfile}.$$ ${litestatsfile} # Doing the two step to keep the time
# when there are NO stats to a minimum
# Make the graphical log report.
./graphftpweblog -N "Graphical Web Log Report for $name"
-U "/statistics" \
-P "$basestatsdir" \
-A 1 -D 1 -d 40 -f 40 -H 1 \
$statsfile > $basestatsdir/graph.html
# Just to be sure file permission are correct
chmod 644 $litestatsfile $statsfile $basestatsdir/graph.html
chmod 644 $basestatsdir/*Stats.gif
Getting sophisticated
A number of sites are running multiple servers. By taking advantage
of the command line options you can tailor the reports for each server -
in fact you can even make seperate reports for different sections of a
single server. When doing that - I recommend making one 'with the works'
report with all reports turned on, and then using the ability to read old
reports to efficiently extract special interest reports. This is
much faster than generating new reports from the
original access_log.
Note: You can't extract domains and meaningfully associate them
with a file sections from an old log report. You have to do that
particular trick using the original access_log. You can extract domains
from an old log report for analysis OR extract file names
from an old report and have it mean something. But not both.
An example
Let's say you have a user named 'johndoe' on your
server. You could get a report on *just* his pages by using:
ftpweblog -t www -N 'Web Pages for John Doe' -D 0
-d 0
-L 0 -i fullreport.html -r '^/~johndoe' /dev/null >
johndoe.html
Breaking it down:
- -t www
- Specifies this report as being about a WWW server. Not strictly
needed since we aren't actually reading a log file.
- -N 'Web Pages for John Doe'
- This sets the title of the report to 'WWW Log Report for John Doe'
- -D 0
- Suppress the full domain report because it would be meaningless
- -d 0
- Suppress the top 40 domains report because it would be meaningless
- -L 0
- Suppress the top level domains report, again because it would be
meaningless
- -i fullreport.html
- Specifies to read the file 'fullreport.html' for an already created
FTPWebLog report
- -r '^/~johndoe'
- Only include files that have paths that start with /~johndoe
This is an extremely powerful feature - you can use it to extract reports
on graphic files, individual users, and archive sections.
- /dev/null
- Read the current 'log' from '/dev/null'. Just an easy trick to let
you focus on the prepocessed report you already made without having to
process a real access_log.
- > johndoe.html
- Put this extracted report in the file 'johndoe.html'
You will also find in this distribution a 'ftpweblog-103a1' file
- this is an experimental version of FTPWebLog that supports Apache's
mod_config_log module and improves FTPWebLog's memory management (you
should save TONS of memory now if you turn off the domain related
reports). You should be able to directly copy your 'LogFormat' directive
value into the appropriate line and have the program parse your custom
log format. It is nowhere near complete - it does work.
Benjamin "Snowhare" Franz