Project

forest

0.0
No commit activity in last 3 years
No release in over 3 years
A simple collection class to aggregate tree objects. It takes [Adjacency List](http://sqlsummit.com/AdjacencyList.htm) as input, shows some stats and the top x biggest trees.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

 Project Readme

Forest

Summary

A simple collection class to aggregate tree objects. It takes Adjacency List as input, shows some stats and the top x biggest trees. This is a simple wrapper on top of rubytree for now.

Why ?

Most database tables have some hierarchy related data (eg: who's your boss, who invited you to join, etc) without even realising it in adjacency list format. Aggregating info from these trees are a bit difficult if the depth of each trees are not even. Also, most trees only have 0 or 1 node attached, which need to be filtered out before rendering. That's why I created a simple wrapper to extract only trees (top x biggest trees) which are interesting enough to render.

Usage

forest [filename]

Input file format

"parent key", "child key", "other attributes"(optional)
  • If "child key" is either nil, NULL, or "", then the row becomes a root tree
  • If "other attributes" are specified, they will be showed at tree diagram as optional information.
  • Example sql to generate the input csv "select parent_id, id, name from foo"

Input file example

examples/input.csv

1,,foo,foo2,foo3
2,1
13,2
3,1,bar
4,3
5,3
6,3
10,9
9,8
8,7
7,
11,
14,12
12,
16,15
15,
17,15
18,15
19,15
20,15

This will form the following tree hierarchy.

1 - 2 - 13
  - 3 - 4
      - 5
      - 6
7 - 8 - 9 - 10
11
12 - 14
15 - 16
   - 17
   - 18
   - 19
   - 20

And here is the output.

forest examples/input.csv

Total 20: Average :4.0 Max size :7 Max height :4 Max width :5, Sandard Deviation 2.28035085019828
Top 3 trees
#1, sum 7, height 2
* 1 foo,foo2,foo3
|---+ 2 
|    +---> 13 
+---+ 3 bar
    |---> 4 
    |---> 5 
    +---> 6 
Node Name: 2 Content:  Parent: 1 Children: 1 Total Nodes: 2
Node Name: 3 Content: bar Parent: 1 Children: 3 Total Nodes: 4
#2, sum 6, height 1
* 15 
|---> 16 
|---> 17 
|---> 18 
|---> 19 
+---> 20 
Node Name: 16 Content:  Parent: 15 Children: 0 Total Nodes: 1
Node Name: 17 Content:  Parent: 15 Children: 0 Total Nodes: 1
Node Name: 18 Content:  Parent: 15 Children: 0 Total Nodes: 1
Node Name: 19 Content:  Parent: 15 Children: 0 Total Nodes: 1
Node Name: 20 Content:  Parent: 15 Children: 0 Total Nodes: 1
#3, sum 4, height 3
* 7 
+---+ 8 
    +---+ 9 
        +---> 10 
Node Name: 8 Content:  Parent: 7 Children: 1 Total Nodes: 3
#4, sum 2, height 1
* 12 
+---> 14 
Node Name: 14 Content:  Parent: 12 Children: 0 Total Nodes: 1
#5, sum 1, height 0
* 11 

Performance.

I tried parsing about 80000 rows. It used to take about 20 min, but now it takes 40 sec with 200mb memory space. The result may very depending on how deep your each tree is.

TODO (or my wish list)

  • Get rid of "Node Name" output (Annoying)
  • Add more aggregate functions
  • Add more filtering functions (eg: show trees which has depth of more than 3)
  • Create a conversion program to draw on Graphviz diagram
  • Create a conversion program to draw histogram on R
  • Create a conversion program to nested set model or materialised path
  • Create an adapter to switch between rdbms, nosql, or file system to avoid storing everything in memory.