Tag Archives: node.js

Node.js tutorial : real time geolocalized tweets

Recently I’ve played a little bit with node.js around real time visualization of Tweets, and I want to share with you one of my experiment.
The idea is to display in real time a heatmap of tweets.
For this, I’ve used node.js . I am a long time Ruby lover but always open for new technology, even if the same objective would have been achieved with EventMachine.
You can see the result on twittmap

tweetmap

So here is the code : with two main parts, the server and the client.

schema
The server

There are two parts, the connection to the real time twitter stream, and the connection between the client and the server. Each part is around 100 lines, so pretty small. See at the end of the article if you want to install it yourself.

I don’t wanted to force the user to log in into twitter to use the application, so I just a signle key for everybody. To get your own key and access token, you need to go to dev.twitter.com, register an app (to get the access_token) and get your own account token for this app, everything is explained.
For Twitter, I use the « Twit » package .
Let’s look at the code :

var T = new Twit({  // You need to setup your own twitter configuration here!
  consumer_key:    process.env.CONSUMER_KEY,
  consumer_secret: process.env.CONSUMER_SECRET,
  access_token:    process.env.ACCESS_TOKEN,
  access_token_secret:process.env.ACCESS_TOKEN_SECRET
});

Now, I open a stream with a filter on location, but I filter on [-180,-90,180,90] which is basically all locations. The only risk is to be limited by Twitter (the « limit » event), which occurs if rate of the stream is higher than 5% of total tweets.

var stream = T.stream('statuses/filter', { locations: world});
stream.on('error',function(error){
  console.log(error);
});
stream.on('limit', function (limitMessage) {
  console.log("Limit:"+JSON.stringify(limitMessage));
});
stream.on('tweet', function (tweet) {
  if(tweet.geo){
    var coords=tweet.geo.coordinates;
    clients.forEach(function(socket){
      var currentBounds=bounds_for_socket[socket.id];

      if(currentBounds&&(coords[1]>currentBounds[0])
                      &&(coords[0]>currentBounds[1])
                      &&(coords[1]<currentBounds[2])
                      &&(coords[0]<currentBounds[3])){

        totalSent+=1;
        if(totalSent%100==0)console.log("Sent:"+totalSent);
        var smallTweet={
          text:tweet.text,
          user:{   screen_name:       tweet.user.screen_name,
                   profile_image_url: tweet.user.profile_image_url,
                   id_str:            tweet.user.id_str},
          geo: tweet.geo
        };
        socket.emit('stream',smallTweet);
      }
    });
  }
  });

When a tweet is received, I first check if there is a geo location field (should be always the case because I filter on geolocation, but geojson allows other type of representation). Then, I convert the tweet to a « smallTweet » which is basically a subset of all the Twitt field, in order to reduce it size . Then, I sen dit to all connected clients (using socket.io) if the client is ‘looking ‘ at this area.

This is managed by the second part, the « socket.io » part.

var bounds_for_socket={}; 
var clients=[];  // the list of connected clients
io.sockets.on('connection', function (socket) {

  socket.on('recenter',function(msg){
    bounds_for_socket[this.id]=JSON.parse("["+msg+"]");
  });

  socket.on('disconnect',function(socket){
    //  Here we try to get the correct element in the 
    //  client list
    for(var i=0;i<clients.length;i++){
      client=clients[i];
      if(client.client.id==this.id){clients.splice(i,1)}
    }
    delete bounds_for_socket[this.id];

  });
  clients.push(socket); // Update the list of con. clients
  currentBounds=JSON.parse(socket.handshake.query.bounds);
  bounds_for_socket[socket.id]=currentBounds;
});

The small difficulty is to maintain, for every client connected, the socket id and also the bounding box for the connection. The bounding box is either sent when the connection is opened or by a message by the client to the server, when the user move the map for instance.

The client

Now let’s look at the client. I use Leaflet, which is a general-purpose mapping library with heatmap which can display heatmap on top of a map. As the heatmap is very colorful, I choose Black & white tiles to see the difference. The twitter rate (dizains of tweets/seconds) don’t allows to display a full world heatmap (for this there is other means, see : ) so I limit the zoom factor to 5.

  var baseLayer = L.tileLayer(
  'http://korona.geog.uni-heidelberg.de:8008/tms_rg.ashx?z={z}&x={x}&y={y}',{
    attribution: 'Map data &copy; <a href="http://openstreetmap.org">OpenStreetMap</a> contributors, <a href="http://creativecommons.org/licenses/by-sa/2.0/">CC-BY-SA</a>, Imagery © <a href="http://cloudmade.com">CloudMade</a>',
    maxZoom: 18
  });

  var cfg = {
    "radius": 5,
    "maxOpacity": .8,
    "scaleRadius": false,
    "useLocalExtrema": true,
    latField: 'lat',
    lngField: 'lng',
    valueField: 'count'
  };

  heatmapLayer = new HeatmapOverlay(cfg);
  map = new L.Map('map-canvas', {
    center: new L.LatLng(46.99,2.58),
    zoom:   7,
    minZoom:5,
    layers: [baseLayer, heatmapLayer]
  });
  map.on('zoomend',updateSocket);
  map.on('dragend',updateSocket);
  if(bounds=$.urlParam('bounds')){
    rect=JSON.parse(bounds);
    map.fitBounds(L.latLngBounds([rect[1],rect[0]],
                                 [rect[3],rect[2]]));
  }

bounds is the bounds view, passed in the url so a specific location can be bookmarked and reused later.

As you see, there is callback « updateSocket » called when the user finished to zoom in/zoom out or to move the map. This callback will tell the server to update his location zone through a socket, and will update also the url ‘bounds’ parameter :

function updateSocket(){
  window.history.pushState("TweetOMap","TweetOMap","/?bounds=["+map.getBounds().toBBoxString()+"]");
  if(socket)socket.emit("recenter",map.getBounds().toBBoxString());
}

The last part is the initialisation of the socket. This part will connectet to the server (or reconnect when connection is lost and the recovered) .

 function startSocket(){
  socket = io.connect('/', {query: "bounds=["+map.getBounds().toBBoxString()+"]"});
  socket.on('stream', function(tweet){
    addPoint(tweet);
  });
  socket.on('reconnect',function(){
    console.log("Reconnect");
    updateSocket();
  });
}

When a new « tweet » event is received through the server, the addPoint procedure is called :

function addPoint(tweet)
{
  if(tweet.geo){
    pt={lng:tweet.geo.coordinates[1],lat:tweet.geo.coordinates[0],count:1};
    if(showTweets){
      bubble=hover_bubble.shift();
      bubble.setContent("<img src="+tweet.user.profile_image_url+" align=left><b>@"+tweet.user.screen_name+"</b><br>"+tweet.text)
      .setLatLng(tweet.geo.coordinates)
      .addTo(map);
      hover_bubble.push(bubble);
    }
    heatmapLayer.addData(pt);
  }
}

We add the point to heatmap layer (heatMapLayer.addData(pt) ) but we also add a bubble to the map with some information on the tweet. This can be switched off using the showTweets flags. Only the last 10 tweets are displayed.

The full source code is available here: https://github.com/tomsoft1/TweetOMap

I’ve tried a new provider to deploy the app, nodejitsu. It’s free for the first application with some limitation. It’s very close to heroku and it worked quite well.

To install it :

git clone 'https://github.com/tomsoft1/TweetOMap'
cd TweetOMap
vi tweet.js # put your own cred. or put them in an env variables
npm install
node tweets.js

Then using localhost :8080 you should see the map.

Note that the url modification allows you to “remember” the state of the map, for instance, to look around new york, use this link: http://twittmap.nodejitsu.com/?bounds=[-78.2940673828125,38.86965182408357,-70.51025390625,41.79998325207397]

Conclusion
As a ruby lover, this was a nice experience. Node.js ecosystem is obviously very good, especially in this part. The ability to easily set up a socket connection between the client and the server is a real plus.

I like it as a tool on some tasks , that can be used in combination of others tools. I personally use it for instance in conjunction with others Rails or EventMachine program. Rails is much more mature in terms of framework, library and ORM support but Node.js is more advanced for evented io (but sometime , ‘too much’, see my Evented dictature article).

Note that the same principle has been used in my TweetMap experiment.

Node.js and Asynchronicity dictature

I’ve made recently some experiment with Node.js, the new kid on the block. Coming from the Ruby and Event Machine world, the evented approach is not new, but some aspect of JavaScript make the approach quite fun.

First examples are fine, and you enjoy it. But as soon as you try to do more complex things, you are facing the pyramid of hell: the callback nightmare.

So what’s wrong with it?

The selling point of Node.js is the evented approach. The problem is that too many event kill the evented approach. Everything want to be an event, and most of the code you are writing is then to achieve synchronicity .

Let’s take a first example, with a small “pyramid of doom”: open a file, write a line, and close it (I know that there is specific function to do it once, but the objective here is to show the issue with “all async”)

var fs = require('fs');

fs.open('toto', 'a', 666, function( e, id ) {
  fs.write( id, "Test", function(){
    fs.close(id, function(){
      console.log('file closed');
    });
  });
});

The Node.js community is aware of the issue, and says tha promise will change all, and the promise of promise, is to make async things happenning sequentially.
Basically, instead of having a pyramid, you chain events.
So same exemple , using ‘Q’, a promise library:

var fs = require('fs');
var Q=require('q');

Q.nfcall(fs.open,'toto', 'a', 666)
.then(function(data){
	fs.write( data, "Test", null, 'utf8');
})
.then(function(data){
	fs.close(id);
});

Ok, no more pyramid of doom, but not totally sure that’s is much better than before. A lot of code just to ensure sequentiality.

Async vs Sync

Let’s compare to the “classical” approach

id=fs.open('toto','a',666);
id.write("Test",null,'utf8');
id.close();

(this won’t work on node.js currently!)

Or even better in a chainable approach:

fs.open('toto','a',666).write("TEST").close();

The code is 10x time easier to read than the previous one.

Yes, but evented io is faster, it’s the future!

Evented io is great, no doubt. The point is: do we really need to raise this to the level of the developer? 95% of the tasks are sequential, even those who require IO. So instead of exposing this to the developer, the language/framework should be able to hide this to the developer, using the ability to do other things during these events.

This is the idea behind fibers, or more generally behind cooperative multitasking. This won’t make your program slower but will just hide some of the complexity for you. You still can have concurrency, joins, etc…

Event is good when :

  • You really have ‘unexpected event’ or this part of your application is push based, like somebody pressing a button on a UI, a web request, a Tweet coming in a stream, etc…

These are really event.

When event is not really needed :

  • When you want to read/write file, access to the database, etc… This does not means that you must be blocked, this just mean that in that case programmer wants to be sequential

Golang and others choose this path, and this make the code much simpler to read.

I predict than in a couple of year, all the node.js community will suddenly discover that “sequenciality is not so bad” and will introduce fiber or ways to make synch.