Project

bytepump

0.0
No commit activity in last 3 years
No release in over 3 years
Uses the linux splice syscall to rapidly transport data between file descriptors in kernel memory.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.11
~> 5.0
~> 10.0

Runtime

 Project Readme

THIS README IS OUTDATED AND THIS REPO IS ONLY KEPT AS A CODE EXAMPLE

bytepump

A small Ruby gem to efficiently splice the contents of one file descriptor to another, using Linux syscalls.

What it does

If you have two IO objects that are backed by a file, a socket or a pipe (basically anything with a file descriptor), this gem will define a method splice_to that will use the Linux splice syscall to copy the contents from one of your IOs to the other, while keeping the actual data copied out of the Ruby VM and so not triggering any GC. As such it works a bit like IO::copy_stream, but it always works in a nonblocking way, using a timeout where necessary and optionally calling a block whenever some data is written to the downstream socket.

There are also a few helper methods to make "edge includes" simpler.

Limitations

  • It only works on Linux distributions that have the splice syscall.
  • It only works on Ruby version >2.0, due to the GVL escaping functions used.
  • It only works on IO objects that are actually backed by a linux file descriptor. So, a StringIO won't work.
  • Most Ruby (and C) methods that deal with IO do a lot of buffering, even on reads. Mixing this gem with most IO methods that read from a socket will lead to unexpected results. IO#sysread should be OK though.
  • There is currently not a method that allows for splicing a limited number of bytes, it reads all the way until EOF is reached. Maybe I will add it in a future release.

Examples

Copying a file:

require 'bytepump'
f1 = File.open 'file1.txt' 
f2 = File.open 'file2.txt' 
f1.splice_to f2 #=> (however many bytes were in file1.txt)
f1.close
f2.close

Emulating nonblocking sendfile:

require 'bytepump'
require 'socket'
require 'io/nonblock'
s = ... # we'll assume you already got it from somewhere, like a rack hijack or something
s.nonblock = true
f = File.open 'file1.txt'
f.nonblock = true
#every time some bytes were sent, the block will be called with the amount of bytes.
#if the downstream socket slowlorises and doesn't download any bytes for 60 seconds, this will
#return :timeout_downstream, otherwise it will return the total number of bytes sent
f.splice_to(s, 60) {|b| report_that_some_bytes_were_sent(b) } 
f.close
s.close

Very simple edge include: a picture of Matz

require 'bytepump'
require 'socket'
require 'io/nonblock'
s1 = ... # we'll assume you already got it from somewhere, like a rack hijack or something
#further assume that you have already sent any response headers etc that you want to s1
s1.flush # clear any buffer 
s2 = TCPSocket "s3.amazonaws.com",80
s1.nonblock = true
s2.nonblock = true
#request the page
s2 << "GET /nlga/uploads/item/image/12267/125.png HTTP/1.0\n\n"
s2.skip_headers # will read ahead until it encounters a double \r\n, indicating end of headers
s2.splice_to(s1, 60) #you can also leave the block and it will not report its progress
s1.close
s2.close

Slightly more involved example: Put together a custom zip archive from S3 objects using Wetransfers' ZipTricks library.

require 'bytepump'
require 'socket'
require 'io/nonblock'
require 'zlib'
require 'zip_tricks'
#socket to the user
s = ... # we'll assume you already got it from somewhere, like a rack hijack or something
s.flush
s.nonblock = true
#assume that you have some Enumerable s3_objects that contains the data about the files 
#that should go into the archive. We'll just assume they're all STORED entries for simplicity,
#but allowing for DEFLATEd objects is trivial
ZipTricks::Streamer.open(s) do | zip |
    s3_objects.each do |obj|
        zip.add_stored_entry(filename: obj.filename, size: obj.filename, crc32: obj.crc32)
        #you will need to set your bucket permissions right for this
        s.flush #'normal' Ruby IO is heavily buffered and doesn't play well with bytepump
        bytes_written = s.splice_from(host: s3_url, path: obj.s3_path) {|b| report_bytes_sent(b)}
        zip.simulate_write(bytes_written)
    end
end #ending the block will cause the central directory for the archive to be written
s.close