Project

webvac

0.0
No commit activity in last 3 years
No release in over 3 years
UGC management/backup using venti
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

>= 0
>= 0
>= 0
>= 0
 Project Readme
This is a somewhat specialized chunk of code!

Using a lookup table in Redis and venti as the backing storage, webvac
serves static content.  It comes with utility `webvac-sweep`, that
pushes content into venti, optionally removing it from the filesystem.
The `webvac-server` program itself starts up a webserver.  Give these
programs '-h' or '--help' to see helpful(?) information, and see below
for configuration.

Currently I am only using it to serve uploads for UGC in some small- to
moderately high-traffic Pleroma instances.  I have been using venti to
do incremental backups of the data (replication is speedy), and since
the uploads are WORM ("write once, read many") data, I thought it'd be
cool to serve the data directly out of venti.  It worked better than
expected!

I expect to use this more often and thus expect it to become a bit more
general as a result, but for right now, it makes a couple of assumptions
about where it serves files.

= Quick Start

  [Check that your venti and Redis servers are operational.]
  $ sudo ed /etc/webvac.json
  a
  {"server_path_prepend":"/where/uploads/get/put/in/the/filesystem"}
  .
  wq
  $ sudo $EDITOR /etc/nginx/whatever
  [edit your nginx configuration to proxy requests for files to localhost:8891
  if the file doesn't exist]
  $ webvac-server -D
  $ find /where/uploads/get/put/in/the/filesystem -type f -print0 |
   xargs -0 -n30 -P8 webvac-sweep -v -- 
  [At this point, you should be able to rm the files in server_path_prepend
  to start serving them out of venti.  Big ones are slow for now.]

= Mise en place

You will need a venti server and a Redis server.  It'll work fine with
either Plan 9 venti or P9P venti.  You will also need the 'vac' and
'unvac' utilities that come with P9P.

You will need to either have all the stuff in the default places, or
you'll need to configure a handful of things.  Otherwise, you will need
to configure probably one thing.

= Overhead

Just empirically, for about 100GB of files, venti takes 86GB of disk (no
surprise, since it's mostly JPGs and MP4s, so it's already compressed;
the savings are probably from dedup), and Redis takes about 60MB of RAM
for this.  All of the files were put into venti as part of the backup
solution.  CPU overhead is negligible, the server takes about 200MB of
RAM for both workers.

= Installation

`gem install webvac` should do it.

= Configuration

The nginx config I used to test is in doc/examples.

You will need to create a JSON file that specifies the configuration.
The first readable file from the following list is used:

* The filename given in the $WEBVAC_CONFIG environment variable.
* ./config/webvac.json
* $HOME/.webvac.json
* /etc/webvac.json

Here is an annotated list of configuration options, all of which start
with their default values.  You will almost certainly need to change
server_path_prepend.

  {
    // A Redis connection string, one that Redic will take:
    "redis_url": "redis://localhost:6379/0",

    // These two options are used to turn the request path into a filesystem
    // path.  Watch this space, I might turn server_path_strip into an array.

    // The beginning of the path that the entire server sits under; that is,
    // what should be stripped from the front of the path.  For Pleroma, users'
    // uploads are all under /media, so this works if you're serving uploads from
    // https://$domain/media/$file .  You should probably not change this one for
    // now, as Rake routes depend on it.
    "server_path_strip": "/media",
    // The beginning of the path in the *filesystem* that corresponds to where
    // the files are kept.  It is *incredibly* unlikely that you are using the
    // same path I am.
    "server_path_prepend": "/media/block/fse",

    // This is where the venti server is.  (This might be an array at some
    // point.)  This address is in dial(2) syntax, by the way, so $host:$port
    // should be written "tcp!$host!$port".  
    "venti_server": "localhost",

    // Where the vac and unvac binaries are kept:
    "plan9bin": "/opt/plan9/bin",

    // This is to make UGC work safely out of the box.  Hopefully no browser is
    // dumb enough to run stuff inside <script> tags in something we're serving
    // as text/plain:
    "mime_substitutions": {
      "text/html": "text/plain"
    }
  }

= Usage

Afer configuring, you can run the server with `webvac-server`.  This
will actually serve the content from venti, as long as it is present in
the path→score index in Redis (so you can remove content as needed by
just removing items from the index).  In order to add items, you run
`webvac-sweep`.  You can also delete the file (which will only happen if
the sweep is successful) with `-d`.

= TODO

See doc/TODO

= See also:

"Venti: a new approach to archival storage": http://doc.cat-v.org/plan_9/4th_edition/papers/venti/

"Pleroma":  https://pleroma.social/

= I feel dirty

You can throw Bitcoin (BTC) at this address:  1BZz3ndJUoWhEvm1BfW3FzceAjFqKTwqWV .  Proceeds will go to funding the instance hosting thing.