0.0
No commit activity in last 3 years
No release in over 3 years
Command line tool that given a plaintext file containing URLs (one per line), downloads all of them to the local hard disk.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 12.0
~> 3.0
~> 1.5.2
~> 0.20.0
~> 3.10.0

Runtime

~> 5.0
 Project Readme

ImgFetcher

Gem Version Ruby Style Guide francoprud

ImgFetcher is a command-line tool that given a plaintext file containing URLs (one per line), downloads all of them to the local hard disk.

Installation

From RubyGems you can download it with the following command:

    $ gem install img_fetcher

Or, you can clone this repository and build and install the gem by yourself:

  1. Clone the repository.
  2. Install Ruby dependencies:
    $ bundle install
  1. Build and install the gem with the Rake command:
    $ bundle exec rake install

For development

Temporary folders must be created in order to run the tests, so you must run the following bash command to setup the development environment:

    $ ./bin/setup

This will run bundle install, and create tmp/ and spec/support/tmp/ directories.

Usage

After installing the gem, you will be able to run the script with the command line.

    $ img_fetcher -f plaintext.txt -o output_directory/

You can type img_fetcher --help at the terminal for more information.

Usage: img_fetcher -f <file_path> [options...]
    -f, --file FILE_PATH             [REQUIRED] Fetch and store the images from each line from the given file
    -o, --output OUTPUT_DIRECTORY    Specify the output directory
    -V, --version                    Show version number and quit
    -v, --verbose                    Make the operation more talkative
    -t, --threaded                   Run the command with multiple threads

Regarding the OUTPUT_DIRECTORY, folder MUST exist. In case it doesn't, files will be stored in the current directory (./).

Threaded option

Regarding the --threaded option, it's a basic ruby thread usage. Further improvements will be to limit the amount of threads with a pool of threads. Only the ImgFetcher::Stats class is synchronized with a Mutex. I don't really know if puts must be synchronized given that it's constantly accessing to stdout.

Output

If --verbose option is selected, then the output of the command will be shown at the terminal with the following structure:

FILE ROW INDEX, STATUS, FILE ORIGINAL LINE

The command returns the downloaded files preserving their original filenames (whenever is possible) at the end, starting with 6 random characters to avoid collisions between already existing files.

Downloading file from URL

Regarding the download of files from a URL, the first approach will be using open-uri, but knowing that this input will be generated by external users, open-uri has some limitations and security issues if it's nothandled carefully. After doing some research, Down gem takes care of all these issues for you, as well as valid URL, file size, timeouts, number of redirects, connectivity, and more.

For this case, we limit the maximum number of redirects to 0 and there's no limit about the file size. Looking for an improvement, both can be added as a command-line option in a future.

Possible improvements

  1. If URLs are repeated along the file, don't fetch them again.
  2. Creating a pool of threads for further customization.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/francoprud/img_fetcher. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.

License

ImgFetcher gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the ImgFetcher project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.