Project

log_sense

0.0
The project is in a healthy, maintained state
Generate analytics in HTML, txt, and SQLite format for Apache and Rails log files.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

Runtime

 Project Readme

README

Introduction

LogSense generates reports and statistics from Apache and Ruby on Rails logs. Written in Ruby, it runs from the command line, it is fast, and it can be installed on any system with a relatively recent version of Ruby. We tested on Ruby 2.6.9, Ruby 3.0.x and later.

When generating reports, LogSense reports the following data:

  • Visitors, hits, unique visitors, bandwidth used
  • Most accessed HTML pages
  • Most accessed resources
  • Missed resources (also by IP) which helps highlight potential attacks
  • Response statuses
  • Referers
  • OS, browsers, and devices
  • IP Country location, thanks to the DP-IP lite country DB
  • Streaks: resources accessed by a given IP over time
  • Performance of Rails requests

A special output format ufw generates rules for the Uncomplicated Firewall to blacklist IPs requesting URLs matching a specific pattern.

Filters from the command line allow to analyze specific periods and distinguish traffic generated by self polls and crawlers.

LogSense generates HTML, txt, ufw, and SQLite outputs.

Apache Report Structure

./screenshots/apache-screenshot.png

Rails Report Structure

./screenshots/rails-screenshot.png

UFW Report

The output format ufw generates directives for Uncomplicated Firewall blacklisting IPs requesting URLs matching a given pattern.

We use it to blacklist IPs requesting WordPress login pages on our websites… since we don’t use WordPress for our websites.

Example

$ log_sense -f apache -t ufw -i apache.log
# /users/sign_in/xmlrpc.php?rsd
ufw deny from 20.212.3.206

# /wp-login.php /wordpress/wp-login.php /blog/wp-login.php /wp/wp-login.php
ufw deny from 185.255.134.18

...

An important word of warning

Log poisoning is a technique whereby attackers send requests with invalidated user input to forge log entries or inject malicious content into the logs.

log_sense sanitizes entries of HTML reports, to try and protect from log poisoning. Log entries and URLs in SQLite3, however, are not sanitized: they are stored and read from the log. This is not, in general, an issue, unless you use the data from SQLite in environments in which URLs can be opened or code executed.

Motivation

LogSense moves along the lines of tools such as GoAccess (which strongly inspired the development of Log Sense) and Umami, both focusing on privacy and data-ownership: the data generated by LogSense is stored on your computer and owned by you (like it should be)[fn:1].

LogSense is also inspired by static websites generators: statistics are generated from the command line and accessed as static HTML files. LogSense thus significantly reduces the attack surface of your web server and installation headaches. We have, for instance, a cron job running on our servers, generating statistics at night. The generated files are then made available on a private area on the web.

Installation

gem install log_sense

Usage

log_sense --help

Examples:

log_sense -f apache -i access.log -t txt > access-data.txt
log_sense -f rails -i production.log -t html -o performance.html

Code Structure

The code implements a pipeline, with the following steps:

  1. Parser: parses a log to a SQLite3 database. The database contains a table with a list of events, and, in the case of Rails report, a table with the errors.
  2. Aggregator: takes as input a SQLite DB and aggregates data, typically performing “group by”, which are simpler to generate in Ruby, rather than in SQL. The module outputs a Hash, with different reporting data.
  3. GeoLocator: add country information to all the reporting data which has an IP as one the fields.
  4. Shaper: makes (geolocated) aggregated data (e.g. Hashes and such), into Array of Arrays, simplifying the structure of the code building the reports.
  5. Emitter generates reports from shaped data using ERB.

The architecture and the structure of the code is far from being nice, for historical reason and for a bunch of small differences existing between the input and the outputs to be generated. This usually ends up with modifications to the code that have to be replicated in different parts of the code and in interferences.

Among the points I would like to address:

  • The execution pipeline in the main script has a few exceptions to manage SQLite reading/dumping and ufw report. A linear structure would be a lot nicer.
  • Two different classes are defined for steps 1, 2, and 4, to manage, respectively, Apache and Rails logs. These classes inherit from a common ancestor (e.g. ApacheParser and RailsParser both inherit from Parser), but there is still too little code shared. A nicer approach would be that of identifying a common DB structure and unify the pipeline up to (or including) the generation of reports. There are a bunch of small different things to highlight in reports, which still make this difficult. For instance, the country report for Apache reports size of TX data, which is not available for Rail reports.
  • Geolocation could become a lot more efficient if performed in SQLite, rather than in Ruby
  • The distinction between Aggregation, Shaping, and Emission is a too fine-grained and it would be nice to be able to cleanly remove one of the steps.

Change Log

See the CHANGELOG file.

Compatibility

LogSense should run on any system on which a recent version of Ruby runs. We tested it with Ruby 2.6.9 and Ruby 3.x.x.

Concerning the outputs:

  • HTML reports use Zurb Foundation, Data Tables, and Vega Light, which are all downloaded from a CDN
  • The textual format is compatible with Org Mode and can be further processed to any format Org Mode can be exported to, including HTML and PDF, with the word of warning in the section above.

Author and Contributors

Shair.Tech

Known Bugs

No known bugs; an unknown number of unknown bugs. (See the open issues for the known bugs.)

License

Source code distributed under the terms of the MIT License.

Geolocation is made possible by the DB-IP.com IP to City database, released under a CC license.

[fn:1] There is a small catch: CSS and JavaScript for layout and plots are downloaded from a CDN. Technically, thus, if you generate HTML reports and open them, a request is performed and the CDN might keep a track (see CDN Security and Privacy on Wikipedia for more details). Textual reports don’t have this issue.