webcp.hostinghacks.net/slackware | webalizer
The Webalizer is a fast, free web server log file analysis program. It produces highly detailed, easily configurable usage reports in HTML format, for viewing with a standard web browser.
PREREQUISITES: http://webcp.hostinghacks.net/slackware/graphics.libraries/ , slc, zlib
INSTALLS: webalizer
The installation commands can be run from a Putty window
in a "cut-and-paste" style layout or copied to a script.
Notes on Putty best practices
can be found here.
Compile webalizer :
cd /usr/src wget ftp://ftp.mrunix.net/pub/webalizer/webalizer-2.01-10-src.tgz # wget http://hostinghacks.net/dist/webalizer-2.01-10-src.tgz tar -zxf webalizer-2.01-10-src.tgz cd /usr/src/webalizer-2.01-10 ./configure --prefix=/usr make make install
Create the webalizer configuration file:
cat >> /etc/webalizer.conf << "EOF" # HTMLBody <body bgcolor="#FFFFFF" link="#0000FF" vlink="#FF0000"> # HTMLTail <font size=-1><center>Site Statistics</center></font> Incremental yes # DNSCache # DNSChildren 10 IgnoreHist no PageType htm* PageType cgi PageType php PageType pl CountryGraph no TopSites 0 TopKSites 0 TopURLs 30 TopKURLs 30 TopReferrers 30 TopAgents 30 TopCountries 0 TopEntry 30 TopExit 30 TopSearch 30 TopUsers 0 IndexAlias index.cgi IndexAlias index.php #HideReferrer Direct Request HideURL *.gif HideURL *.GIF HideURL *.jpg HideURL *.JPG HideURL *.png HideURL *.PNG HideURL /watch-info GroupURL /cgi-bin/* CGI Scripts GroupReferrer yahoo.com/ Yahoo! GroupReferrer excite.com/ Excite GroupReferrer infoseek.com/ InfoSeek GroupReferrer webcrawler.com/ WebCrawler GroupReferrer google.com/ Google GroupReferrer lycos.com/ Lycos GroupReferrer metacrawler.com/ Metacrawler GroupAgent MSIE Internet Exploder GroupAgent Mozilla Netscape GroupShading yes MangleAgents 3 EOF
READING WEBALIZER STATISTICS:
Hits - total number of requests made to the server.
Files - the total number of hits (requests) that actually resulted in something being sent back to the user. (excludes 404-Not Found + requests already in the browsers cache). the difference between hits and files gives an indication of repeat visitors
Sites - # of unique IP addresses/hostnames that made requests.
Visits # of first request from a remote site. during a specified timeout period (default is 30 minutes) ; only pages will trigger a visit, remotes sites that link to graphic and other non- page URLs will not be counted in the visit totals, reducing the number of false visits.
Pages - URLs that would be considered the actual page being requested ; defaults to any URL that has an extension of .htm, .html or .cgi.
A KByte - Total data that was transfered between the server and the remote machine.
Run webalizer manually to check for errors:
mkdir /home/hostinghacks.net/www/web/stats webalizer -c /etc/webalizer.conf -o /home/hostinghacks.net/www/web/stats \ /home/hostinghacks.net/www/logs/access
03.24.2005 I spent some trying to make webalizer do reverse dns lookups. To date I haven't had any luck with this. Theoretically it should work with something like this: ./configure --prefix=/usr --enable-dns DNS lookups are made against a DNS cache file containing IP addresses and resolved names. If the IP address is not found in the cache file, it will be left as an IP address. In order for this to happen, a cache file MUST be specified when the Webalizer is run, either using the '-D' command line switch, or a "DNSCache" configuration file keyword. If no cache file is specified, no attempts to perform DNS lookups will be done. The cache file can be made in two different ways 1) You can have the Webalizer pre-process the specified log file at run-time, creating the cache file before processing the log file normally. This is done by setting the number of DNS Children processes to run, either by using the '-N' command line switch or the "DNSChildren" configuration keyword. This will cause the Webalizer to spawn the specified number of processes which will be used to do reverse DNS lookups.. generally, a larger number of processes will result in faster resolution of the log, however if set too high may cause overall system degredation. A setting of between 5 and 20 should be acceptable, and there is a maximum limit of 100. If used, a cache filename MUST be specified also, using either the '-D' command line switch, or the "DNSCache" configuration keyword. Using this method, normal processing will continue only after all IP addresses have been processed, and the cache file is created/updated. 2) You can pre-process the log file as a standalone process, creating the cache file that will be used later by the Webalizer. This is done by running the Webalizer with a name of 'webazolver' (ie: the name 'webazolver' is a symbolic link to 'webalizer') and specifing the cache filename (either with '-D' or DNSCache). If the number of child processes is not given, the default of 5 will be used. In this mode, the log will be read and processed, creating a DNS cache file or updating an existing one, and the program will then exit without any further processing. Preprocess the dns cache file: touch /var/lib/dns_cache.db webazolver -N 10 -D /var/lib/dns_cache.db /home/domain/www/logs/access
HOW WEBALIZER WORKS:
- A default configuration file is scanned for. A file named webalizer.conf is searched for in the current directory, and if found, it's configuration data is parsed. If the file is not present in the current directory, the file /etc/webalizer.conf is searched for and, if found, is used instead.
- Any command line arguments given to the program are parsed. This may include the specification of a configuration file, which is processed at the time it is encountered.
- If a log file was specified, it is opened and made ready for processing.
- If an output directory was specified, the program does a chdir(2) to that directory in prepration for generating output. If no output directory was given, the current directory is used.
- If a non-zero number of DNS Children processes were specified, they will be started, and the specified log file will be processed, creating or updating the specified DNS cache file.
- If no hostname was given, the program attempts to get the hostname using a uname(2) system call. If that fails, localhost is used.
- A history file is searched for in the current directory (output directory) and read if found. This file keeps totals for previous months, which is used in the main index.html HTML document. Note: The file location can now be specified with the HistoryName configuration option.
- If incremental processing was specified, a data file is searched for and loaded if found, containing the 'internal state' data of the program at the end of a previous run. Note: The file location can now be specified with the IncrementalName configuration option.
- Main processing begins on the log file. If the log spans multiple months, a seperate HTML document is created for each month.
- After main processing, the main index.html page is created, which has totals by month and links to each months HTML document.
- A new history file is saved to disk, which includes totals generated by The Webalizer during the current run.
- If incremental processing was specified, a data file is written that contains the 'internal state' data at the end of this run.
Berkeley DB is an open source embedded database library that provides scalable, high-performance, transaction-protected data management services to applications. Berkeley DB provides a simple function-call API for data access and management.
---with-db=DIR Alternate location for db headers --with-dblib=DIR
http://linux.cudeso.be/linuxdoc/webalizer.php
http://www.mrunix.net/webalizer/