webcp.hostinghacks.net/slackware | webalizer

home   ·.   download   ·.   install   ·.   faq   ·.   forums   ·.   contribute   ·.   change log   ·.   toolbox
SYNOPSIS:

The Webalizer is a fast, free web server log file analysis program. It produces highly detailed, easily configurable usage reports in HTML format, for viewing with a standard web browser.

PREREQUISITES: http://webcp.hostinghacks.net/slackware/graphics.libraries/ , slc, zlib

INSTALLS: webalizer

The installation commands can be run from a Putty window in a "cut-and-paste" style layout or copied to a script. Notes on Putty best practices can be found here.

INSTALLATION:
last updated: Mar. 2005

Compile webalizer :

cd /usr/src
wget ftp://ftp.mrunix.net/pub/webalizer/webalizer-2.01-10-src.tgz
# wget http://hostinghacks.net/dist/webalizer-2.01-10-src.tgz
tar -zxf webalizer-2.01-10-src.tgz
cd /usr/src/webalizer-2.01-10

./configure --prefix=/usr 

make
make install

Create the webalizer configuration file:

cat >> /etc/webalizer.conf << "EOF"

# HTMLBody <body bgcolor="#FFFFFF" link="#0000FF" vlink="#FF0000">
# HTMLTail <font size=-1><center>Site Statistics</center></font>

Incremental yes
# DNSCache
# DNSChildren 10
IgnoreHist no

PageType htm*
PageType cgi
PageType php
PageType pl

CountryGraph no
TopSites 0
TopKSites 0
TopURLs 30
TopKURLs 30
TopReferrers 30
TopAgents 30
TopCountries 0
TopEntry 30
TopExit 30
TopSearch 30
TopUsers 0

IndexAlias index.cgi
IndexAlias index.php
#HideReferrer Direct Request

HideURL *.gif
HideURL *.GIF
HideURL *.jpg
HideURL *.JPG
HideURL *.png
HideURL *.PNG
HideURL /watch-info

GroupURL /cgi-bin/* CGI Scripts
GroupReferrer yahoo.com/ Yahoo!
GroupReferrer excite.com/ Excite
GroupReferrer infoseek.com/ InfoSeek
GroupReferrer webcrawler.com/ WebCrawler
GroupReferrer google.com/ Google
GroupReferrer lycos.com/ Lycos
GroupReferrer metacrawler.com/ Metacrawler
GroupAgent MSIE Internet Exploder
GroupAgent Mozilla Netscape
GroupShading yes
MangleAgents 3

EOF
NOTES:

READING WEBALIZER STATISTICS:

Hits - total number of requests made to the server.

Files - the total number of hits (requests) that actually resulted in something being sent back to the user. (excludes 404-Not Found + requests already in the browsers cache). the difference between hits and files gives an indication of repeat visitors

Sites - # of unique IP addresses/hostnames that made requests.

Visits # of first request from a remote site. during a specified timeout period (default is 30 minutes) ; only pages will trigger a visit, remotes sites that link to graphic and other non- page URLs will not be counted in the visit totals, reducing the number of false visits.

Pages - URLs that would be considered the actual page being requested ; defaults to any URL that has an extension of .htm, .html or .cgi.

A KByte - Total data that was transfered between the server and the remote machine.

Run webalizer manually to check for errors:

mkdir /home/hostinghacks.net/www/web/stats

webalizer -c /etc/webalizer.conf -o /home/hostinghacks.net/www/web/stats \
/home/hostinghacks.net/www/logs/access

03.24.2005

I spent some trying to make webalizer do reverse dns lookups.
To date I haven't had any luck with this.  Theoretically it
should work with something like this:

./configure --prefix=/usr --enable-dns

DNS lookups are made against a DNS cache file containing IP addresses
and resolved names.  If the IP address is not found in the cache file,
it will be left as an IP address.  In order for this to happen, a
cache file MUST be specified when the Webalizer is run, either using
the '-D' command line switch, or a "DNSCache" configuration file
keyword.  If no cache file is specified, no attempts to perform DNS
lookups will be done. The cache file can be made in two different ways

1) You can have the Webalizer pre-process the specified log file at
   run-time, creating the cache file before processing the log file
   normally.  This is done by setting the number of DNS Children
   processes to run, either by using the '-N' command line switch or
   the "DNSChildren" configuration keyword.  This will cause the
   Webalizer to spawn the specified number of processes which will
   be used to do reverse DNS lookups.. generally, a larger number
   of processes will result in faster resolution of the log, however
   if set too high may cause overall system degredation.  A setting
   of between 5 and 20 should be acceptable, and there is a maximum
   limit of 100.   If used, a cache filename MUST be specified also,
   using either the '-D' command line switch, or the "DNSCache"
   configuration keyword.  Using this method, normal processing will
   continue only after all IP addresses have been processed, and the
   cache file is created/updated.

2) You can pre-process the log file as a standalone process, creating
   the cache file that will be used later by the Webalizer.  This is
   done by running the Webalizer with a name of 'webazolver' (ie: the
   name 'webazolver' is a symbolic link to 'webalizer') and specifing
   the cache filename (either with '-D' or DNSCache).   If the number
   of child processes is not given, the default of 5 will be used. In
   this mode, the log will be read and processed, creating a DNS cache
   file or updating an existing one, and the program will then exit
   without any further processing.
   
Preprocess the dns cache file:

touch /var/lib/dns_cache.db
webazolver -N 10 -D /var/lib/dns_cache.db /home/domain/www/logs/access

HOW WEBALIZER WORKS:

  1. A default configuration file is scanned for. A file named webalizer.conf is searched for in the current directory, and if found, it's configuration data is parsed. If the file is not present in the current directory, the file /etc/webalizer.conf is searched for and, if found, is used instead.
  2. Any command line arguments given to the program are parsed. This may include the specification of a configuration file, which is processed at the time it is encountered.
  3. If a log file was specified, it is opened and made ready for processing.
  4. If an output directory was specified, the program does a chdir(2) to that directory in prepration for generating output. If no output directory was given, the current directory is used.
  5. If a non-zero number of DNS Children processes were specified, they will be started, and the specified log file will be processed, creating or updating the specified DNS cache file.
  6. If no hostname was given, the program attempts to get the hostname using a uname(2) system call. If that fails, localhost is used.
  7. A history file is searched for in the current directory (output directory) and read if found. This file keeps totals for previous months, which is used in the main index.html HTML document. Note: The file location can now be specified with the HistoryName configuration option.
  8. If incremental processing was specified, a data file is searched for and loaded if found, containing the 'internal state' data of the program at the end of a previous run. Note: The file location can now be specified with the IncrementalName configuration option.
  9. Main processing begins on the log file. If the log spans multiple months, a seperate HTML document is created for each month.
  10. After main processing, the main index.html page is created, which has totals by month and links to each months HTML document.
  11. A new history file is saved to disk, which includes totals generated by The Webalizer during the current run.
  12. If incremental processing was specified, a data file is written that contains the 'internal state' data at the end of this run.

Berkeley DB is an open source embedded database library that provides scalable, high-performance, transaction-protected data management services to applications. Berkeley DB provides a simple function-call API for data access and management.

---with-db=DIR             Alternate location for db headers
--with-dblib=DIR 
REFERENCES:

http://linux.cudeso.be/linuxdoc/webalizer.php

http://www.mrunix.net/webalizer/

Powered By Fat Penguin Hosting   |   Disclaimer