Using The DLF log module with Log Analysis Tools

There are a number of very good third party tools which allow you to perform sophisticated analysis on your server log files. These tools expect the logfiles to be formated in a standard manor. However, as no formal standard for Web Server log files is widely supported, checking compatibility between server log files and log tools can be difficult.

You can choose one of the predefined log file formats, which are widely recognised by log analysis tools. Alternatively you can provide your own dlf log format string. This allows you to define a format which will be compatible with any log tools. More information on the DLF log format can be found here.

Predefined Log File Formats

NCSA Common Log Format (CLF)

The Common Log Format (CLF) was originally used by the NCSA server, it has become the defacto standard for log files and is the format expected by a number of log analysis tools. The CLF logs contains most of the information you'll need to perform comprehensive analysis of your site.

This is the most basic log file format and should be understood by all log analysis tools. The CLF records each request as a single line, containing several tokens which are separated by spaces.

     host ident authuser date request status bytes

NCSA Combined Log Format

The Combined log format is an extension of the CLF format. It originally combined a number of different files into a single file, the additional files contained the referer, (last page visited) and browser (user-agent) information.

     host ident authuser date request status bytes referer user-agent

Shared NCSA CLF

This is an extension of the CLF format to cope with multiple virtual servers logging to the same file. Each server prefixes its hostname to the start of the log file.

     v-host host ident authuser date request status bytes

Shared NCSA Combined Log Format

This is an extension of the Combined format to cope with multiple virtual servers logging to the same file. Each server prefixes its hostname to the start of the log file.

     v-host host ident authuser date request status bytes referer user-agent

Netscape NSAPI Extended Format

This is essentially the same as the Combined log format, except the additional refer and user-agent information is split onto a separate line.

     host ident authuser date request status bytes
referer user-agent

Field Contents

host The fully qualified domain name or IP address of the connecting machine. The server will record the name if reverse DNS lookup is turned on, or alternatively just record the IP address. Reverse DNS lookups can place an extra load on the server and consume extra bandwidth, for this reason you can disable reverse DNS lookups.

e.g. www.name.co.uk or 194.234.126.31

ident Never set. Always appears as a hyphen (-). The original CLF logs would store any information returned by identd for the connection. Identd is a UNIX only service and is rarely used. The ident lookup also causes excessive network bandwidth utilisation and extras server overhead, for these reasons it is not recorded.
authuser The userid used in the request if the requested document was password protected. For more details on authenticated pages and access control see the appropriate section.

e.g. lee or zeus or fred

date The date and time of the request. in the format:
[day/month/year:hh:mm:ss zone]
day = Two digit day of the month : "00" - "31"
month = Three letter month abbreviation : "Jan" … "Dec"
year = Four digit full format year (1997) : "0000" - "9999"
hour = Two digit hour : "00" - "23"
minute = Two digit minute : "00" - "59"
second = Two digit seconds : "00" - "59"
zone = Four digit signed timezone adjustment : "(+/-)0000" - "(+/-)9999"

Eg. [08/May/1997:16:27:54 +0100]

request The first line of the request received from the client. This contains information about what file the client wants and exactly how to it should be sent. It is of the format :

"method url protocol/version"

method = HTTP method, such as GET or POST
url = The URL requested by the client
protocol = Always HTTP
version = The version number of the protocol.

Eg. "GET /index.html HTTP/1.0"

status The three digit status code returned by the server in response to this request, for a full list of status codes see the appendix.

Eg. 200 or 303 or 406

bytes The number of bytes sent by the server, not including the HTTP header.
v-host The host name of the virtual server which made the log entry.
Referer The URL of the document which refers (links) to the current document. This may not be sent by the client to the server for every request, in which case a "-" is entered in the log file.
User-Agent This is a text string which identifies the name of the client (browser) making the request to the server. Analysing this information allows you to optimise your site for the most amount of visitors.