Tag Archives: twitter

Node.js tutorial : real time geolocalized tweets

Recently I’ve played a little bit with node.js around real time visualization of Tweets, and I want to share with you one of my experiment.
The idea is to display in real time a heatmap of tweets.
For this, I’ve used node.js . I am a long time Ruby lover but always open for new technology, even if the same objective would have been achieved with EventMachine.
You can see the result on twittmap

tweetmap

So here is the code : with two main parts, the server and the client.

schema
The server

There are two parts, the connection to the real time twitter stream, and the connection between the client and the server. Each part is around 100 lines, so pretty small. See at the end of the article if you want to install it yourself.

I don’t wanted to force the user to log in into twitter to use the application, so I just a signle key for everybody. To get your own key and access token, you need to go to dev.twitter.com, register an app (to get the access_token) and get your own account token for this app, everything is explained.
For Twitter, I use the « Twit » package .
Let’s look at the code :

var T = new Twit({  // You need to setup your own twitter configuration here!
  consumer_key:    process.env.CONSUMER_KEY,
  consumer_secret: process.env.CONSUMER_SECRET,
  access_token:    process.env.ACCESS_TOKEN,
  access_token_secret:process.env.ACCESS_TOKEN_SECRET
});

Now, I open a stream with a filter on location, but I filter on [-180,-90,180,90] which is basically all locations. The only risk is to be limited by Twitter (the « limit » event), which occurs if rate of the stream is higher than 5% of total tweets.

var stream = T.stream('statuses/filter', { locations: world});
stream.on('error',function(error){
  console.log(error);
});
stream.on('limit', function (limitMessage) {
  console.log("Limit:"+JSON.stringify(limitMessage));
});
stream.on('tweet', function (tweet) {
  if(tweet.geo){
    var coords=tweet.geo.coordinates;
    clients.forEach(function(socket){
      var currentBounds=bounds_for_socket[socket.id];

      if(currentBounds&&(coords[1]>currentBounds[0])
                      &&(coords[0]>currentBounds[1])
                      &&(coords[1]<currentBounds[2])
                      &&(coords[0]<currentBounds[3])){

        totalSent+=1;
        if(totalSent%100==0)console.log("Sent:"+totalSent);
        var smallTweet={
          text:tweet.text,
          user:{   screen_name:       tweet.user.screen_name,
                   profile_image_url: tweet.user.profile_image_url,
                   id_str:            tweet.user.id_str},
          geo: tweet.geo
        };
        socket.emit('stream',smallTweet);
      }
    });
  }
  });

When a tweet is received, I first check if there is a geo location field (should be always the case because I filter on geolocation, but geojson allows other type of representation). Then, I convert the tweet to a « smallTweet » which is basically a subset of all the Twitt field, in order to reduce it size . Then, I sen dit to all connected clients (using socket.io) if the client is ‘looking ‘ at this area.

This is managed by the second part, the « socket.io » part.

var bounds_for_socket={}; 
var clients=[];  // the list of connected clients
io.sockets.on('connection', function (socket) {

  socket.on('recenter',function(msg){
    bounds_for_socket[this.id]=JSON.parse("["+msg+"]");
  });

  socket.on('disconnect',function(socket){
    //  Here we try to get the correct element in the 
    //  client list
    for(var i=0;i<clients.length;i++){
      client=clients[i];
      if(client.client.id==this.id){clients.splice(i,1)}
    }
    delete bounds_for_socket[this.id];

  });
  clients.push(socket); // Update the list of con. clients
  currentBounds=JSON.parse(socket.handshake.query.bounds);
  bounds_for_socket[socket.id]=currentBounds;
});

The small difficulty is to maintain, for every client connected, the socket id and also the bounding box for the connection. The bounding box is either sent when the connection is opened or by a message by the client to the server, when the user move the map for instance.

The client

Now let’s look at the client. I use Leaflet, which is a general-purpose mapping library with heatmap which can display heatmap on top of a map. As the heatmap is very colorful, I choose Black & white tiles to see the difference. The twitter rate (dizains of tweets/seconds) don’t allows to display a full world heatmap (for this there is other means, see : ) so I limit the zoom factor to 5.

  var baseLayer = L.tileLayer(
  'http://korona.geog.uni-heidelberg.de:8008/tms_rg.ashx?z={z}&x={x}&y={y}',{
    attribution: 'Map data &copy; <a href="http://openstreetmap.org">OpenStreetMap</a> contributors, <a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>, Imagery © <a href="http://cloudmade.com">CloudMade</a>',
    maxZoom: 18
  });

  var cfg = {
    "radius": 5,
    "maxOpacity": .8,
    "scaleRadius": false,
    "useLocalExtrema": true,
    latField: 'lat',
    lngField: 'lng',
    valueField: 'count'
  };

  heatmapLayer = new HeatmapOverlay(cfg);
  map = new L.Map('map-canvas', {
    center: new L.LatLng(46.99,2.58),
    zoom:   7,
    minZoom:5,
    layers: [baseLayer, heatmapLayer]
  });
  map.on('zoomend',updateSocket);
  map.on('dragend',updateSocket);
  if(bounds=$.urlParam('bounds')){
    rect=JSON.parse(bounds);
    map.fitBounds(L.latLngBounds([rect[1],rect[0]],
                                 [rect[3],rect[2]]));
  }

bounds is the bounds view, passed in the url so a specific location can be bookmarked and reused later.

As you see, there is callback « updateSocket » called when the user finished to zoom in/zoom out or to move the map. This callback will tell the server to update his location zone through a socket, and will update also the url ‘bounds’ parameter :

function updateSocket(){
  window.history.pushState("TweetOMap","TweetOMap","/?bounds=["+map.getBounds().toBBoxString()+"]");
  if(socket)socket.emit("recenter",map.getBounds().toBBoxString());
}

The last part is the initialisation of the socket. This part will connectet to the server (or reconnect when connection is lost and the recovered) .

 function startSocket(){
  socket = io.connect('/', {query: "bounds=["+map.getBounds().toBBoxString()+"]"});
  socket.on('stream', function(tweet){
    addPoint(tweet);
  });
  socket.on('reconnect',function(){
    console.log("Reconnect");
    updateSocket();
  });
}

When a new « tweet » event is received through the server, the addPoint procedure is called :

function addPoint(tweet)
{
  if(tweet.geo){
    pt={lng:tweet.geo.coordinates[1],lat:tweet.geo.coordinates[0],count:1};
    if(showTweets){
      bubble=hover_bubble.shift();
      bubble.setContent("<img src="+tweet.user.profile_image_url+" align=left><b>@"+tweet.user.screen_name+"</b><br>"+tweet.text)
      .setLatLng(tweet.geo.coordinates)
      .addTo(map);
      hover_bubble.push(bubble);
    }
    heatmapLayer.addData(pt);
  }
}

We add the point to heatmap layer (heatMapLayer.addData(pt) ) but we also add a bubble to the map with some information on the tweet. This can be switched off using the showTweets flags. Only the last 10 tweets are displayed.

The full source code is available here: https://github.com/tomsoft1/TweetOMap

I’ve tried a new provider to deploy the app, nodejitsu. It’s free for the first application with some limitation. It’s very close to heroku and it worked quite well.

To install it :

git clone 'https://github.com/tomsoft1/TweetOMap'
cd TweetOMap
vi tweet.js # put your own cred. or put them in an env variables
npm install
node tweets.js

Then using localhost :8080 you should see the map.

Note that the url modification allows you to “remember” the state of the map, for instance, to look around new york, use this link: http://twittmap.nodejitsu.com/?bounds=[-78.2940673828125,38.86965182408357,-70.51025390625,41.79998325207397]

Conclusion
As a ruby lover, this was a nice experience. Node.js ecosystem is obviously very good, especially in this part. The ability to easily set up a socket connection between the client and the server is a real plus.

I like it as a tool on some tasks , that can be used in combination of others tools. I personally use it for instance in conjunction with others Rails or EventMachine program. Rails is much more mature in terms of framework, library and ORM support but Node.js is more advanced for evented io (but sometime , ‘too much’, see my Evented dictature article).

Note that the same principle has been used in my TweetMap experiment.

Infography: iPhone vs. android shows north vs. south split (and in real time)

This is an interesting visualisation that I’ve created a few days ago. The principle is simple: each geolocalized Tweet is displayed on a map, in real time. Tweets sent from an iPhone are displayed in Red while tweets sent from an Android are displayed in Blue. Others (emitter not know, bots, 4square updates, etc.…) are displayed in white.

Here is the map:

capture
(click to see it full screen)

The results are quite interesting: it shows that the split android/iPhone happens more at a country/continent level than at a user level. USA, England, Japan are in their vast majority « iPhone users », while South America, Spain, Indonesia are much more Android focused. France is one of the few balanced countries.
In other words, seems another north vs. south split, or rich vs. poor (it seems for instance that some Brazilian big cities are iPhone users while the rest of the country is much more Android).

You can see it in action live: tweetworldtom.herokuapp.com
Technically speaking, I’ve just used a node.js server that stream all tweets on a client through soket.io, and then use processing.js to draw them on top of a world map.

In another article I’ll show some source code example on how this has been created.

Playing with Arduino: real HashtagBattle Twitter display

I just started a few days ago to play with Arduino.

First, I must say it’s a great platform: setup is easy, samples are great. I’ve made some embeeded development by the past, and I would dreamed of such platform!

So here is my first small project: an “Hashtagbattle display”

Real World hashtagBattle
The idea is to show the result of an Hashtagbattle on Twitter using a something more physical, in that case an arrow indicating the tendance of the results.
So we are couting the number of tweets containing hashtag a (let’s say “iphone”) and compare it to the numbers of tweet containing hashtag b (let’s say “#android”) and the direction of the arrow depend of the result. If we have the same amount of hashtag b and hashtag a, the arrow will indicate the center.

The small difficulty here was to use the streaming API of Twitter to do this real time

The board:

- the board is quite simple: there is a servo that will display the arrow direction, and two leds, a red and green, each one flashing when there is on tweet on HashTag A (red) or HashTag B are coming

sketch
The software:

As I currently have only an Arduino UNO, there is no direct connection with the internet. But I use the PC for this task, and communicate with the board through the serial line.
So the PC sends every 100 ms a line with 3 informations:
- the angle of the servo
- 0 or 1 to indicate if first leds need to be enabled
- 0 or 1 to indicate if second leds needs to be enabled

For instance:

32 1 0

Will tell that the servo need to be of 32° and red led needs to blink

So the program is the following:

require 'tweetstream'
require "serialport"

input=ARGV.shift || "#bordeaux,#strasbourg"
#params for serial port
port_str = "/dev/tty.usbmodem1411"  #may be different for you

sp = SerialPort.new(port_str, 9600, 8, 1, SerialPort::NONE)

TweetStream.configure do |config|
  config.consumer_key       = '<YOUR KEY>'
  config.consumer_secret    = '<YOUR CONSUMER KEY>'
  config.oauth_token        = '<OAUTH TOKEN>'
  config.oauth_token_secret = '<TOKEN SECRET>'
  config.auth_method        = :oauth
end

@client=TweetStream::Client.new;

total=0
last=Time.now
words=input.split(',')
flags=Array.new(words.size)
buffers=words.map{|w| []}
sp.puts "512 0 0" # reset the servo
@client.track(words) do |tweet|
  begin
    search=tweet.text.downcase
    words.each_with_index do |word,i|
      if search.include? word
        flags[i]=1
        buffers[i]<<tweet
      end
    end
    if (Time.now-last)>0.11
      begin
        ratio=(180*buffers[0].size/(buffers[0].size+buffers[1].size)).to_i
        str=ratio.to_s+" "+flags.join(' ')
        puts str
        sp.puts str
        last=Time.now
        flags=Array.new(words.size,0)
      end
    end
  end
end

Note that you canstart with parameters:

ruby notifyTweets.rb “#apple,#orange”
or use words instead of hashtags:
ruby notifTweets.rb “apple,orange”
The program needs to be run on the PC where the arduino is connected!

The Arduino part is here
We just se the serial line, wait for a line, and extract the informations from it.

#include <Servo.h> 
int ledPin=13;    // select the input pin for the potentiometer
int nbLed=2;
Servo myservo;  // create servo object to control a servo 

void setup() {
  // Open serial communications and wait for port to open:
  Serial.begin(9600);
  while (!Serial) {
    ; // wait for serial port to connect. Needed for Leonardo only
  }
  for(int i=0;i<nbLed;i++){
   pinMode(ledPin-i, OUTPUT); 
  }
 myservo.attach(9);  // attaches the servo on pin 9 to the servo object 
}

# Utility function to get a value from a string at a given pos
String getValue(String data, char separator, int index)
{
  int found = 0;
  int strIndex[] = {0, -1};
  int maxIndex = data.length()-1;

  for(int i=0; i<=maxIndex && found<=index; i++){
    if(data.charAt(i)==separator || i==maxIndex){
        found++;
        strIndex[0] = strIndex[1]+1;
        strIndex[1] = (i == maxIndex) ? i+1 : i;
    }
  }
  return found>index ? data.substring(strIndex[0], strIndex[1]) : "";
}


void loop() {
  if(Serial.available() >0) {
    String str=Serial.readStringUntil('\n');
    Serial.println("Read:"+str);
    for(int i=0;i<nbLed;i++){
      // look for the next valid integer in the incoming serial stream:
      if(!getValue(str,' ',i+1).equals("0")){
        digitalWrite(ledPin-i, HIGH);  
      }
    myservo.write(getValue(str,' ',0).toInt());
    }
  }
  delay(100);  
  for(int i=0;i<nbLed;i++){
    digitalWrite(ledPin-i, LOW); 
  } 
}

That’s all for this first project, was really fast to develop, thanks to Arduino (and Ruby!)

SpaceSaving algorithm (heavyhitter): compute real time data in streams

I’ve just discovered recently the SpaceSaving algorithm ( ) which is very interesting.

The purpose of this algorithm is to approximate computation of data from an infinite stream using a data structure with a finite size. The typical problem we try to solve is to count the number of occurrences of items coming from an infinite stream.

The obvious approach, is to have a map of items,count somewhere (memory, database) and to increment the count (or create it if not present) in the map

Depending of the distribution, the map size can be as big as the number of items in the feed (so infinite!)

Thanks to the heavy hitter algorithm, you just maintain a small subset of items, and an error associate with them.

If an item is not yet present in the map, you remove the previous minus and replace it with this one, and set the error count to the new one to the total count of the previous one.

So basically, you can have an item appears in the map but with an high error count.

The result depends of the number chosen for the size of the map , and of the distribution of the data, but I wanted to try by myself and Ive done a small implementation in ruby and a sample using the Twitter real time stream to cunt URL’s occurrence in the feed

class SpaceSaving
  attr_accessor :counts,:errors
  def initialize in_max_entries=1_000
    @in_max_entries=in_max_entries
    @counts={}
    @errors={}
  end
  #Add a value in the array
  def add_entry value,inc_count=1
     count=@counts[value]
     if count==nil   #newentry
      if @counts.size>=@in_max_entries
        min=counts.values.min
        old=counts.key min
        @counts.delete old
        @errors.delete old
        @errors[value]=min
        count=min
      else
        count=0
      end
     end
     @counts[value]=count+inc_count
  end
  
end

I just wanted to keep it as simple as possible for now, but in a real world example, it would be good to encapsulate the way to access to the @counts field.
By default, the size of the map is set to 1.000, but this can be modified….

And the sample.rb

require 'bundler/setup'
require 'tweetstream'
require './SpaceSaving.rb'

TweetStream.configure do |config|
  config.consumer_key       = <CONSUMER_KEY>
  config.consumer_secret    = <CONSUMER_SECRET>
  config.oauth_token        = <OAUTH_TOKEN>
  config.oauth_token_secret = <OAUTH_SECRET>
  config.auth_method        = :oauth
end

@client=TweetStream::Client.new;

urls=SpaceSaving.new

total=0

@client.track('http') do |status|
  begin
    # add each URL in the tweet , and count it once
    status.urls.each do |url|
     urls.add_entry(url.expanded_url,1)
    end
    total+=1
    if total%1000==0 then
      puts "Treated:#{total}"
      limit=urls.counts.values.sort[-30]
      urls.counts.each do|u,c|
        if c>limit 
          puts "Counts #{c} #{u} errors:#{urls.errors[u]}"
        end
      end
    end
  end
end

The complete sourcecode is available on Github: https://github.com/tomsoft1/SpaceSaving

So I’ve made a few tests and comparaison with an exact approach, and the result for the top 20 was similar, even after 600.000 tweets parsed, with a “small” 1.000 size comparing to around 300.000 entries for the exact algo.

TvTweet: realtime analytics around SocialTv and Twitter

TvTweet.

Thanks to TvTweet, you can have real time information about all tvshows and their twitter audience, for France and UK.
Beyonf real time information, you have anlytics and historical data about these Tv show.

You have a cool real time dashboard:

Dashboard

or some more detailled information about a show:

For more informations, visit us, there are plenty of features on the way!

Twitter Hacked!

This is a translation of an article published in French, on the Korben.info website

twitter fail whale Hack de Twitter   La suite...

I’ve been contacted yesterday by the guy who have hacked Twitter. His pseudo his Hacker Croll (here is the initial reference to Hacker Croll, but in French) and explained to me that he was able to access to the various email boxes of the twitter employee including Evan Williams ones and his wife. This allowed him to have access to all a number of astonishing informations.

He had access to the Paypal, Amazon, Apple , AT&T, MobileMe and Gmail accounts of Evan Williams, Sara Morishige Williams, Margaret Utgoff and Kevin Thau (twitter employees)

Here are the snapshots that the hacker sent to me:

Evan Williams1 Hack de Twitter   La suite...

He was able to access to the Registar information of the Twitter domain name, and he could have been able to redirect twitter domain name to any other IP address (or simply steal the domain name)

Capture 1201 Hack de Twitter   La suite...

Capture 115 Hack de Twitter   La suite...

But the most incredible, was the quantity of internal information that he was able to get on Twitter:

  • the complete list of employees
  • their food preferences
  • their credit card numbers
  • some confidential contracts with Nokia, Samsung, Dell, AOL, Microsoft and others
  • direct emails with web and showbizz personalities
  • phone numbers
  • meeting reports (very informatives)
  • internal document templates
  • time sheet
  • applicant resumes
  • salary grid (time for me to move..lol)

But amongst all these information, you can see some funny things like:

  • the “possible” launch of a TV reality show where contestant will go across USA et will win contests thanks to their followers, with a 100 000$ price at the end (but for a nonprofit organization)
  • Some growing predictions that target 25 millions of users end of 2009, 100 millions ends of 2010, 350 millions ends of 2010…with revenue  that I will not disclose here…
  • A list of new star account like Wyclief Jean, DuranDuran, Cartoon Network, Cisco, UCLA, Guillaume Pepy (CEO of one of the biggest french company, the SNCF), Nirvana, Toshiba, 50 Cents,…. etc…

Capture 1101 Hack de Twitter   La suite...

  • The plan of their new offices with a list of whishes from the employee who would like a sleeping room, a playing room, plants, a chief cuisto, a meditation room, bicycle room, adjustable desks, sport room,washer/dryer, wifi, lockers, wine cellar, an aquarium and others…They seems to have imagination….

Capture 93 Hack de Twitter   La suite...

  • We learn also their idea about Twitter monetization…Of course, we’ve got certified accounts, but also advertising with the ability to put AdSense widget,  or sponsored tweets. Twitter whish also to be the first service to reach the billion of  users (which is highly probable). They defined themself more as a “nervous system” than an alert system.
  • We also learn that french president will soon use Twitter (@NicolasSarkozy ) and that Nicolas Princen which will do this.
  • And we’ve got also some “test” of t-shirt and cap designs

Capture 10621 Hack de Twitter   La suite...

Capture 1032 Hack de Twitter   La suite...

So Twitter has been visited by this hacker. Since then, everything is back to normal thanks to security recommendations:

Capture 1121 Hack de Twitter   La suite...

passwords have been changed. The information given by Hacker Croll is from beginning of may, but are still very instructive. In his mail, Hacker Croll explains the things to learn from this misadventure:

What I would like to say is that even the biggest and the strongest do silly things without realizing it and I hope that my action will help them to realize that nobody is safe on the net. If I did this it’s to educate those people who feel more secure than simple Internet novices.And security starts with simple things like secret questions because many people don’t realise the impact of these question on their life if somebody is able to crack them.

concerning me, I’ve put here only the information that are not against twitter because I am a big fan of Evan and his team works. I’ve just relayed some information of Hacker Croll and what I can tell to Twitter team is that this hacker seems to have a conduct code which will not give any prejudice to the company.

Now, clearly, we see from this hacking demonstration that it’s very easy to guess a simple password from a secrete question, and from this to enter into other account (Facebook, GMail and others) and from this enter in the heart of a company, both in accessing confidential data  but also by paralyzing business symply by getting a few domain names or admin accounts.

So, don’t stop to be paranoid. Don’t use secret question, use a different password for each of your account, don’t put inline sensible documents, etc… In short, be careful..