Tuesday, April 18, 2017

GoAccess: Visualize Your Web Traffic

If you want a visual representation of your web traffic, try GoAccess (https://goaccess.io/). You will never let it go.

I am just generating a report using static log files. GoAccess has many other features (real-time updates, console access, JSON/CSV-exporting etc.) that I haven't got the need to use (for now).

Tools Used:

- Mac OS 10.12
- GoAccess v1.2


Installation

brew install goaccess

Configuration

Retrieving logs from the web servers (Apache 2.4-based), sample output:
10.230.33.71 - - [26/Mar/2017:02:41:09 +0800] "GET /retail/mce/img/icons/icon-close.png HTTP/1.1" 200 196 314
10.230.33.71 - - [26/Mar/2017:02:36:26 +0800] "GET /retail/mce/img/icons/icon-close.png HTTP/1.1" 304 - 314
10.230.33.71 - - [25/Mar/2017:22:47:19 +0800] "GET /retail/mce/css/icons.data.svg.css HTTP/1.1" 304 - 314
10.230.33.71 - - [25/Mar/2017:17:46:35 +0800] "GET /retail/mce/css/icons.fallback.css HTTP/1.1" 304 - 314

Tip #1: To remove 'noise' if your web server is shared by multiple apps, use 'grep' to filter patterns that you are interested:
grep -i 'mypattern' access_log-20170328 > filtered_access_log-20170328

Tip #2: Although GoAccess has the ability to get input from multiple files, you can also combine all your log files via 'cat':
cat a.log b.log c.log > d.log

Log format settings from httpd.conf:
LogFormat "%h %l %u %t \"%r\" %>s %b %D" common

Note: The assumption was to take the Apache log format verbatim, since GoAccess supports the Apache log file format. After much trial and error, the log format for GoAccess is as below. The most frustrating issue was the response times getting rendered as 0.00us due to the inclusion of ">" in %>s. GoAccess only recognises %s.

Open file: /usr/local/Cellar/goaccess/1.2/etc/goaccess.conf, and set the following (adjust based on your own log patterns):
time-format %H:%M:%S
date-format %d/%b/%Y
log-format %h %l %u [%d:%t %^] "%m %U %H" %s %b %D

Save the file after editing, and run the following (assuming log filename is access_log):
goaccess access_log -o report.html -q

where
-o: Report file. For HTML report, specify a filename with *.html extension
-q: Ignore query string i.e., www.google.com/page.htm?query => www.google.com/page.htm

My output/report:



The man page (https://goaccess.io/man) has the full options available.

Side note: Why not AWStats (http://www.awstats.org/)? It was my 1st choice, but I was working on a Macbook, and the easiest route was GoAccess (just use brew to install). AWStats needed Perl and only had ready RPMs at its website's download page. Plus, GoAccess has a more impressive UI dashboard, and I needed to show some fancy stats to biz...

No comments: