Project

wriggler

0.0
No commit activity in last 3 years
No release in over 3 years
A Gem designed to crawl through a local directory of HTML/XML files and pull out content based on pre-specified tag, which will be exported as a manipulatable object
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.10
~> 10.0
>= 0
 Project Readme

Wriggler

Wriggler was created to serve and the crawler for a search engine, moving its way through HTML and/or XML files and grabbing data based on pre determined tags then exporting it in a manipulatable format. Wriggler acts similarly to a spider, but was designed to be used with any number of local files, not as an actual web scraper.

Installation

Add this line to your application's Gemfile:

gem 'wriggler'

And then execute:

$ bundle

Or install it yourself as:

$ gem install wriggler

Usage

You only need to run one command to use Wriggler, run:

Wriggler.crawl(["array", "of", "HTML/XML", "tags"], directory)

Note: The directory in this should be the top level directory that your HTML/XML files are in. Wriggler will account for any nested directories within this directory that also contain HTML/XML files. At the end you will have a data structure that resembles this:

===============
Files Found: 2
===============
content = {
	tag1: ["Content", "Found", "in", "the", "First", "Opened", "File"], ["Content", "Found", "in", "the", "Second", "Opened", "File"]
	tag2: [], []
	tag3: ["Content", "Found", "in", "the", "First", "Opened", "File"], []
	tag4: [], ["Content", "Found", "in", "the", "Second", "Opened", "File"]
}

Where tag2 has no content found between both files, tag3 only found content in the first of the two files, tag4 only found content in the second of two files, and tag1 found content in both.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/elliottayoung/wriggler. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

On top of that, please contribute. I built this for a very specific reason, but I would very much like to see it become something bigger, so if you can assist with that please do!

License

The gem is available as open source under the terms of the MIT License.