No commit activity in last 3 years
No release in over 3 years
Gem for using Selenium webdriver simply
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.12
>= 0
~> 10.0
~> 3.0

Runtime

 Project Readme

Gem Version Build Status

Selenium Standalone DSL

Gem for using Selenium webdriver simply.

Let's scrape sites using JavaScript and iframe!

This project aims to expanding Selenium and Nokogiri.

Installation

Ensure you have already installed Firefox:

sudo apt-get install firefox

If you want to run firefox headlessly, install Xvfb:

sudo apt-get install xvfb

Also on Arch Linux:

sudo pacman -S firefox xorg-server-xvfb

Add this line to your application's Gemfile:

gem 'selenium_standalone_dsl'

And then execute:

bundle

Or install it yourself as:

gem install selenium_standalone_dsl

Usage

require 'selenium_standalone_dsl'

class YahooSearcher < SeleniumStandaloneDSL::Base
  def search_for_wikipedia
    visit 'https://www.yahoo.com/'
    fill_in 'p', with: 'wikipedia'

    # You can declare how to find elements: [class|id|css|xpath]
    click 'IconNavSearch', find_by: :class

    if has_element? :text, 'Next'
      click 'Next'
    end

    # You can use full jQuery CSS selector using 'search'
    puts search('span:contains("results")').inner_text
  end
end

config = {
  log_path: '/tmp/selenium_standalone_dsl.log',
  user_agent: 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 ' +
              '(KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36',
  headless: true,
}

driver = YahooSearcher.new(config)
driver.search_for_wikipedia

# => 141,000,000 results

Supported API

Method Summary Arguments Remarks
click Click a button or link text of link, name, etc.
select Choose an option in select box option text
visit Navigate firefox to a link link url
fill_in Fill in a input box element name
search Search page source using Nokogiri Returns Nokogiri::XML::Element

Options

Name Arguments Defalut
:find_by :link_text, :name, :id, :class, :css, :xpath :link_text

Development

git clone git@github.com:acro5piano/selenium_standalone_dsl.git
cd selenium_standalone_dsl
bundle install --path vendor/bundle
bundle exec rake install
bundle exec rake spec

To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Roadmap

I will create Selenium Spider after completing Selenium Standalone DSL.

Selenium Spider aims to scrape websites using JavaScript.

There are a lot of tools such as mechanize but they are no longer useful for SPA websites.

In my heart, Selenium Spider will have these features:

Full JavaScript support

Based on Selenium Standalone DSL which run Firefox headlessly, it comprehences JavaScript completely.

PMC architecture

PMC = Pagination Model Controller

Generally, scraping is consist of two parts: Listing page and Detail page.

In PMC architecture, Page is for listing items and pagenation.

Model is for extracting information from detail page and store data to database.

Controller is for handling the above two.

Web-based task execution

Scraping tasks are often multiply and difficult to arrange.

Imagine Web-based task execution, definition, csv-export and scheduling like Jenkins.

TODO

  • Fix module name. Dsl => DSL
  • Defalut config
  • Add API document
  • TravisCI
  • CommonLogger
  • Add current_url

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/acro5piano/selenium_standalone_dsl. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.