0.0
No commit activity in last 3 years
No release in over 3 years
Extract content from tweets
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

Runtime

>= 1.4.2
 Project Readme

tweetparser

Extract content from tweets in the form of an s-expression.

Usage

require "tweetparser"
tweet = "Hey @threedaymonk, here is a tweet with #hashtags and a http://example.com/url"
TweetParser.parse(tweet)

This gives:

[[:text, "Hey"], [:space, " "],
 [:atref, "@threedaymonk"], [:text, ","], [:space, " "],
 [:text, "here"], [:space, " "],
 [:text, "is"], [:space, " "],
 [:text, "a"], [:space, " "],
 [:text, "tweet"], [:space, " "],
 [:text, "with"], [:space, " "],
 [:hashtag, "#hashtags"], [:space, " "],
 [:text, "and"], [:space, " "],
 [:text, "a"], [:space, " "],
 [:url, "http://example.com/url"]]

The full list of tweet parts recognised is as follows:

  • :url (http://example.com/ or www.example.com)
  • :username (@username) This was :atref in version 0.1.0
  • :list (@username/listname)
  • :hashtag (#hashtag)
  • :slash (/via)
  • :text
  • :newline
  • :html (pre-composed HTML)

Dependencies

  • treetop
  • polyglot

After checking out the code via git, you need to fetch the conformance test submodule:

git submodule init
git submodule update

Known bugs

  • The maximum length of a username or list is not checked.
  • A username etc. immediately following punctuation is not recognised.
  • Japanese text is not handled correctly.
  • Hashtags containing accents are not supported.