0.0
No commit activity in last 3 years
No release in over 3 years
Graph data from emails.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.7
~> 10.0
~> 3.1.0

Runtime

~> 0.4.1
~> 0.0.1
 Project Readme

EmailGraph

Build and analyze relationship graph data from email history.

Focus is on identities and interactions between them, for example:

  • Who are my strong contacts, as identified by some two-way interaction threshold function?
  • Which of my contacts can we infer are also strong contacts with one another?
  • What is the distribution of my email interactions with a given person over time?

Graph types

There are two types of graphs to be built from email data.

1. Interaction graph

Class: EmailGraph::InteractionGraph

This is a directed graph where each vertex is an email address and each edge is an instance of EmailGraph::InteractionRelationship - a directed interaction history between two emails.

The graph has these properties:

  • Implements common graph methods (e.g., #vertices, #edges, etc)
  • Efficient fetching of vertices' in-edges (not always a default of graph structures)
  • Loops allowed

Given a message, an interaction is created from the sender to every address in the to, cc, and bcc fields - there is no distinction among the latter.

EmailGraph::InteractionRelationship objects have an interactions attribute, which holds an array of the Time objects of the interactions.

Example:

# Assuming you have an array 'messages' of message-like objects (see below for
# how to use the Gmail fetcher)
g = EmailGraph::InteractionGraph.new
messages.each{ |m| g.add_message(m) }

# ...or...

g = EmailGraph::InteractionGraph.new(messages: messages)

# For example, see a sorted list of your email contacts by emails sent
g.edges_from("your@emailhere.com")
  .sort_by{ |e| -e.interactions.size }
  .map{ |e| [e.to, e.interactions.size] }

2. Mutual relationship graph

Class: EmailGraph::UndirectedGraph (just uses the abstract class)

This is an undirected graph where each vertex is also an email, however, this time, the edges are instances of EmailGraph::MutualRelationship - an undirected edge that similarly includes an interaction history (though an undirected one).

This graph is created from an EmailGraph::InteractionGraph by creating undirected edges from pairs of directed edge inverses. Optionally, a filter can be applied during this process to determine whether an undirected edge is created for a given pair of directed edges.

g = EmailGraph::InteractionGraph.new(messages: messages)

# This creates the graph using the default filter, which is that an edge has to
# have an inverse in order to create a new undirected edge.
mg = g.to_mutual_graph

# Alternatively, you can specify a custom filter. For example, this replicates
# the one used by A. Chapanond et al. in their analysis of emails* from the Enron
# case data set
filter = Proc.new do |e, e_inverse|
  if e && e_inverse
    counts = [e.interactions.size, e_inverse.interactions.size]
    counts.all?{ |c| c >= 6 } && counts.inject(:+) >= 30
  else
    false
  end
end
mg = g.to_mutual_graph(&filter)

*Chapanond, Anurat, Mukkai S. Krishnamoorthy, and Bülent Yener. "Graph theoretic and spectral analysis of Enron email data." Computational & Mathematical Organization Theory 11.3 (2005): 265-281.

Email normalization

You'll likely want to normalize email addresses before adding them to a graph. Otherwise, you'll end up with separate vertices for different capitalizations of the same address - not to mention differences with '.' placement and other issues.

EmailGraph::InteractionGraph will do this by default using SoundCloud's Normailize gem.

You can also pass your own email processing block on instantiation for the entire graph, or when calling #add_message.

Fetching emails

A fetcher for Gmail is included for convenience.

g = EmailGraph::InteractionGraph.new

email = "XXX"
# You'll need an OAuth2 access token with Gmail permissions. One way to get one
# is to use the Google Oauth Playground (https://developers.google.com/oauthplayground/)
# and under "Gmail API v1" authorize "https://mail.google.com/".
access_token = "XXX"

f = EmailGraph::GmailFetcher::Fetcher.new( email: email,
                                           access_token: access_token )

# This should cover all emails from that account. If no mailbox param is
# provided, defaults to Inbox.
mailboxes = ['[Gmail]/All Mail', '[Gmail]/Trash']
f.each_message(mailboxes: mailboxes){ |m| g.add_message(m) }

Installation

Add this line to your application's Gemfile:

gem 'email_graph'

And then execute:

$ bundle

Or install it yourself as:

$ gem install email_graph