Development Seed Blog
Faceoff: Google Analytics Vs. Server Logs
Determining the Best Website Tracking Tool
Determining the Best Website Tracking Tool
Recently several of our clients have asked which is the better tool to track website traffic, Google Analytics or server logs. I love to see our clients becoming passionate about their statistics, so this is a fun question for me to tackle. Before you start going back and forth between which tracking system is superior, it’s important to understand that many of your options are fundamentally different (kind of like apples and oranges or cats and dogs).
The Short Answer: Google Analytics
For most smaller organizations, Google Analytics is the solution to pick. It’s comprehensive, easy to use, and it’s free. But it’s not perfect, and it won’t meet every organizations needs.
The Long Answer: It Depends on You
There are pros and cons to all tracking methods, but ultimately the decision should come down to your organization’s makeup and specific needs. Unfortunately, the truth - exactly how many people see your site in any given time period - is essentially unknowable due to the way the internet works. Since computers share and change IP addresses and people have multiple computers and other devices that can access the internet, no web statistic will ever give you a metaphysically true answer. Once you accept this, you can understand how the two main types of solutions attempt to compensate for this and decide which better meets your needs.
Ready?
Server Logs
Sever log files very literally and accurately tell you how many (and which) pages or files were viewed, and you can get reports of this information with log file analysis software like AWstats, Sawmill, and Webalizer. Unfortunately, and this goes along the lines of the uncertainty principle, the more perfectly you know this data the more difficult it is to know how many people really viewed your pages and files.
This is because the information in log files (i.e. time stamp, filename, requesting IP, and browser) is limited. To counteract this log file software defines rules that say what constitutes a visit and a unique visitor, but this information is never fully accurate or standardized across different server logs.
Bottom Line: Log files will always give you a higher number of page views, visits, and unique visitors when compared to a web-based, javascript solution like Google Analytics, and in most cases the high numbers are too high.
Web-Based Solutions
Web-based solutions like Google Analytics and Site Meter allow you to place a javascript tag in your site templates to create a log file (since they don’t have access to your server logs). Most also set a cookie to track referring pages, browsing history, and visitor history (first time or repeat). Web-based tracking solutions like these more accurately track visits since they track individuals (anonymously) and don’t overcount visits like server logs do. However, their numbers still won’t be entirely truthful.
Since these tracking systems rely on javascript and cookies, security conscious visitors won’t be accurately tracked if they have cookies disabled (about 10% of people) or clear their cookies monthly (40% according to Jupiter Research, but that seems high). Also, it looks like at least Site Meter has started to install ad-tracker cookies - or spyware - to the sites it tracks, so be sure to read up on a web-based solution before you install it. Another drawback to web-based tracking systems is that they often don’t track non-html file downloads well, so if your website’s performance is based on file downloads, this may not be the best option for you.
Bottom Line: Web-based solutions give you a better picture of the visits your website receives and report numbers usually a lot lower (but presumably more accurate) than log files. The tradeoff is that users with high security settings may only be partially tracked or not tracked at all and that file downloads are difficult to monitor.
Enterprise Solutions
If neither of the above solutions sound like a good fit for your organization, there are a number of enterprise solutions like WebTrends and Omniture that blend the two approaches to give you an even more accurate picture of your web traffic. However, these can be quite expensive, so I won’t describe them in detail here.
In the end of the day, your communications success will never be based on knowing exactly how many people are on your website. More important and indicative factors are increases and decreases to your traffic, the paths users take through your site, and the actions they take while they are there.
In almost all cases, I would recommend using Google Analytics. It’s very easy to install, it has stellar reporting (and shows all of the above metrics), and it doesn’t add any ad-tracking cookies that can be a major turn off for your visitors. There were some initial problems with the system (but that’s what Beta is for, no?) and some occasional concerns about its accuracy, but these reports are few and far-between.
In the coming week or so, I’ll be pulling some sample data together to compare Google Analytics reports and server logs analysis and will hopefully have a more precise answer as to which is more accurate and by how much. So check back for a report on the Faceoff soon.
UPDATE
The ever-vigilant sorcerer-in-chief Justin Miller pointed out in the comments that I wasn't being very clear at all (I blame society) about the fact that it isn't the server logs that vary in the info – while there are some varying formats in the data, what they record isn't that different. The log-file analysis SOFTWARE, however, don't always make the same assumptions about what constitutes a visit, unique visits, and other things. While the distinctions are fairly minor they can be maddening if you switch packages and your boss really cares about the exact number of visits or visitors instead of the trends. He also caught me accidentally mislabelling AWStats, Webalizer, and Sawmill as being javascript based, which I have corrected in the post. (You've won this round, Miller...)
Comments
logs
Sawmill, Webalizer, and AWstats run off of your web server logs, not JavaScript inserts. Also, I would clarify that the log data is usually standard across servers, but the interpolation of them into visit counts is subject to whatever methods the analysis tool uses.
Nope.
Javascript solutions will always 10-15% lower because either the surfer has javascript turned off or hits back key/navigates through your site too quick for the remotely hosted stats solution to process.
If the provider is overloaded or down no stats are collected and worse if you have the javascript include at the top of your pages this will delay the loading of your pages causing your surfers to leave.
I use Google Analytics on certain sites but leave the include at the bottom of the page and only use it over and above my server logs because it offers a few features that the others don't.
Drupal multisites
Do you know what system (analytics or logs) could work better with a multisite built in drupal? When I say multisite I mean that Drupal is the content management system that hosts many sites.
Thanks!
Thanks
Thanks, this is exactly what i was looking for, well written.
Nice tips.
Nice tips.
Not clear
Hey Steve,
Very interesting post but can you explain exactly why log based stats generate higher page views, session, etc than cookie-based?
Thanks!
Load Balanced Sites Question
When you have load balancing web servers, how does GA or AWstats any other tool handle it?
thanks.
load balancing web servers and Google analytics
When you have load balancing setup on site you need to make sure you have the java script on all the pages for that site. Google has a special id for each site. So no what server that being called from it will report as if that site has the hit. You can get yourself in trouble if you have several sites and put the same code on more then one with the individual id. It would track what ever pages you have that java script on as hits for that site. So for example if I had website a running and I created the java script with google and put the same code on site a and site b. It would look as if site a has all these hits that could of really came from site b.