Project

deckard

0.02
No commit activity in last 3 years
No release in over 3 years
a monitoring system built on couchdb
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

 Project Readme
deckard : http monitoring system

deckard is a http check monitoring system built on top of CouchDB.

license: apache 2

Features:

* Email and SMS based alerts (through email)
* Designated on-call sms email address
* Basic CouchDB replication latency alerts
* Basic content check alerts
* Content check alerts with EC2 elastic IP failover
* All checks are defined in CouchDB (CRUD checks with ReST)
* Alert priorities (log, email, SMS and notifo)
* Simple setup via cron
* Basic scheduling to silence alerts
* Adjustable delay before firing check content requests
* Basic Chef "tag" lookup support
* Alert stats database for trending and analysis

Usage:

$ deckard --all ./deckard.yml

You now have the option of running --all, --failover, --content, --replication if you only want to run a subset of checks

Setup:

* Setup and configure all appropriate databases and alert documents.
* Create a crontab entry

$ crontab -e
*/5 * * * * deckard --all /path/deckard.yml &> /dev/null


Example documents:

On Call document format:

{
   "_id": "on_call_person",
   "sms_email": "8675309@jenny.net"
   "notifo_usernames" : ["jenny"]
}

For sms_email you will need to put in the phone number and sms to email host for your phone provider. Provide both an sms email and notifo username(s) and the sms will be only used for backup if something should go wrong with notifo. Saves you money on your text message bill! *You need the notifo application on your phone to use the notifo support. Note that notifo_usernames is an array of usernames so multiple people can get notifications.


Failover check document format:

{
   "_id": "lb01",
   "url": "http://somecheck.com/check.html",
   "secondary_instance_id": "i-1234",
   "priority": 2,
   "region": "us-east-1",
   "elastic_ip": "127.0.0.1",
   "content": "sometext",
   "failover": true,
   "primary_instance_id": "i-4321"
}

This document needs all the details to cause an elastic ip switch in the case the content is not found on the url.


Replication check format:

{
   "_id": "node01_node02",
   "name": "test",
   "master_url": "http://node01/db",
   "slave_url": "http://node02/db",
   "offset": 0,
   "priority": 1,
   "schedule": [
       2,
       3
   ]
}

This will test the doc counts between two databases and if they become out of sync by more or less than the thresholds specified in the config an alert is triggered.


HTTP content check format:

{
   "_id": "deckard.com:5984/",
   "url": "http://deckard.com:5984/",
   "content": "couchdb",
   "priority": 2
}


For all of these priority and schedule are optional fields in these documents, priority is 0, 1 and 2. 0 is log only, 1 is log and email and 2 is log, email and sms. The schedule is an array containing integers for the hours the alert should be silent. Check out the replication check definition above.


For Chef "tag" support you need to install a view in your chef database and configure the url to it in your Deckard config file.

function(doc) {
    if (doc.chef_type != 'node') return;
    
    emit(doc.automatic.ec2 ? doc.automatic.ec2.public_hostname : doc.automatic.fqdn, doc.normal.tags[0]);
}


Regarding alert stats, just create a stats database and puts it's name in the config file. When alerts happen you should begin to see documents get added. By itself it isn't all that helpful, analysis is the key. Here is a basic "counts" design document to get you started, it will give you stats about the different kinds of alerts you are having.

{
   "_id": "_design/counts",
   "language": "javascript",
   "views": {
       "by_type": {
           "map": "function(doc) {\n  emit(doc.type, 1);\n}",
           "reduce": "_count"
       },
       "by_url": {
           "map": "function(doc) {\n  emit(doc.url, 1);\n}",
           "reduce": "_count"
       },
       "by_error": {
           "map": "function(doc) {\n  emit(doc.error, 1);\n}",
           "reduce": "_count"
       }
   }
}