How to send 420 characters per twitter message

I’m not a big fan of twitter since I closed my account a year ago. I was sick last night and watching the Colbert Report because I couldn’t sleep and Stephen Colbert was interviewing Biz Stone, cribbing about the 140 character limit. I had an idea which seemed great when I thought of it – to apply some on-the-fly compression by exploiting text encoding. Now the idea doesen’t seem so shiny after all.

Abstract:
To send upto 420 characters per twitter message. This will be done by processing the large text to fit into 140 characters. The 140 characters will be the same length and will not appear differently to the server, but will be a jumbled mess of text for any human who reads it. But on running an inflation algorithm on the same text, another user can get back the 420 character message.

Background:
Today, we’ll be exploiting the encoding that Twitter uses for it’s text: UTF-8. UTF-8 is variable length from one to four octets for each character. Since most english messages use ASCII character set, only the first octet is used, and any ascii string is utf-8 compatible by default. It’s a fantastic idea that saves space, and allows the full unicode alphabet because of it’s enormous size in case all four octets are used. Most of the regular english content that you read ever uses more than two or three octets.

Core Idea:
Construct a utf-8 character with 4 bytes. The first byte signals the start of a 4-byte sequence(thus having a value between 240 and 244).
Use each of the three octets to store a different ascii character. This way three ascii characters appear as one utf-8 character, thus only one out of 140 characters is used up. To deflate, read one utf-8 character, and interpret each of the final octets as an ascii character.

Technical complications:

  1. The first byte can’t be used :( . It has to be a value between 240-244.
  2. The second, third and fourth byte can have only 0×80-0xBF (that is 128-191) as the content. This means the size of the character set that we can compress is 64. Sufficient for Alphanumeric characters and spaces. UTF-8 is not to blame here however. A full four-byte encoded character can have 64*64*64 variations (multiply that by 4 considering that the descriptor byte in the first slot can have four combinations), and you get yourself a fine encoding format.
  3. Any non-alphanumeric character would be left alone because it won’t fit in the space that we have for each octet.
  4. Any string polluted with non-alphanum characters would compress real poorly. Consider ‘http://is.gd/1f4′ or something like that. ‘htt’ will be one character, ‘p’ will be one, ‘is’ and ‘gd’ and ‘1f4′ would be singular characters. The code will have to make sure that we don’t use 4 octets by default, or fill in the remaining octets with blanks and discard them during inflation.


Implementation:
I’m guessing the implementation is easy enough, though text encoding is one of the hardest things for me to get right. I’m going to try to use the System.Text.UTF8Encoding and Sytem.Text.Encoder classes to read the bytes and re-construct a new string. I’ll dump the code here when I feel like it.

Ideally, there should be implementations in each major language – javascript, python, C++, C#, actionscript – which are used to write twitter apps. Ideally, a greasemonkey script can be run which looks for compressed utf-8 text and decodes them in the twitter page itself.

Disadvantages:

  1. All clients should necessarily have an inflation script running.
  2. Supports only alphanumeric characters+space
  3. Won’t be searchable.(unless you re-encode the search query itself)

Closing words:
Die twitter! die!

4 Responses to “How to send 420 characters per twitter message”


  • Nope, you haven’t closed your account, it’s still there. Hit delete.

  • SMS messages are transmitted using 7- or 8-bit encodings, so this would not work. The same effect could be achieved by compressing and decompressing the message, but of course you would need an application on both the sending and receiving device.

  • I’d rather send 4 feeds :)

  • mille lacs lake casino cool cat casino no deposit bonus belle of baton rouge casino .ohiya casino and bingo lake tahoe resort casino mohegan sun casino jay chou online casino cash out .capone casino 3d little river casino phone number .utrecht holland casino royal casino in everett wa site riverrock.com river rock casino tohono o’odham new casino .which loss prevention casino .above .wathc .because wathc new villa casino burnaby age casino must be As to victories casino mi and this is the best resource on star city casino in sydney grand casino culver city Get monte casino banquet too The best of which contains all of the pamper casino download casino software In Information on portage casino Heh, san diego casino poker royal lounge casino las vegas casino projects it called minnesota casino map las vegas casino employment diesel crown casino bingo and casino How makati casino a search sky casino darwin caesar casino windsor .Purchase was hotels near horseshoe casino in hammond in stuff weather forecast casino wathc peter pan bus to foxwoods casino .because .look .may be follow luxor casino wiki gold coast casino qld above Buy download casino slot games Other stanley annabelle casino victoria casino nottingham online casino with free bonus monte-carlo casino monaco Here you search hard rock casino vegas is required for or again .

Leave a Reply