back to contents

Systems Stuff: node.js is totally boss

the boss

bruce springsteen is actually The Boss. this is irrefutable. but node.js is catching up.

I'm kind of infatuated with node.js right now, honestly. i've made some cool projects with it.

I've been toying with node.js for about three months, I think. before I even talk about it and how excited I am about it, you need to be comfortable with the following:

if you feel good about those things, we're in business. otherwise, you need to read the linux guide. developing stuff with node.js is command line work. it's low-level. it's fun!

node.js & systems development

first of all, what the hell is node.js to begin with? put simply, it's a server-side Javascript highly-scalable platform, used primarily to build asynchronous programs.

that's crazy. let me explain each piece a little better.

it's server-side javascript! that means instead of javascript running in your browser, it's running on the server. that may seem strange - and it is - but remember, javascript is just a programming language, there isn't anything saying that it can only run in a browser!

specifically, the javascript in node.js is run by Google's super-powerful V8 Javascript Engine. V8 is so cool, it basically compiles the javascript to machine code to make it obscenely fast. if you don't understand what that means, that's okay. just understand that it's fast.

(javascript is typically run by being interpreted by a browser, the same way a server would interpret PHP before handing it off to you. instead, the V8 engine compiles the javascript down to lower-level code. it's goddamn magic!)

and node.js is merely a platform; javascript is the actual language you use to write scripts for it. node.js is just the bundling and customizing of the V8 engine to run Javascript on the server-side.

like Ruby, or Perl, or Bash, or Java, node.js can be a tool for systems development as well as for web development... the difference being: who or what is going to use what you write? If a person in a browser will use it, then it's web development, no matter what the language. but if you're writing something for a computer to talk to, then you're developing systems. node.js can kind of do both - but it's mostly for systems development. i'll explain this more in a systems architecture guide.

and what does asynchronous mean? well, i'd like to dedicate an entire section to it. because it's a big deal.

the asynchronous world

hopefully you know a programming language or two by now, namely stuff like PHP and Javascript and Ruby and maybe Python or Perl or whatever. In common practice, these are all procedural, interpreted languages.

procedural means that you write the program like a step-by-step guide. from line 1 to line 100, the computer goes step by step until it reaches the end, and takes as long as it has to.

if we were to write a script in PHP that connects to a database, grabs some rows, and displays them, the computer would execute that script the following way:

  1. initialize database connection. wait til that's ready.
  2. run a query for some rows. wait til it actually comes back with some.
  3. with each row, show it. wait til each is done before moving on to the next.
  4. end of script, spit out the result to the user!

this typically happens so damn fast that we don't really notice that the computer is just reading a list of steps and executing them. but for every step, it waits for every function in the line to be done before going to the next one.

this is good! because it means we can write something on line 2 that depends on line 1 working. easy! simple! effective!

but slow.

slow in terms of a computer, anyway. a PHP script may take 100 milliseconds, but during that 100 milliseconds, the script and the PHP process it's using can't be doing anything else. all that waiting around for the database to get back, for it to display the rows, for it to hand the result off to the user at the end, is all time effectively wasted!

that's known as the synchronous or blocking world. and it's great for total integrity, but it doesn't scale well. 100 millisecond requests may seem fine when you have a powerful server and maybe 100 visitors. but what if you suddenly get 1,000,000 visitors? the server will drown itself in all the requests while waiting around for the database to do this or that.

this is where asynchronous or non-blocking platforms come in. they are inherently event-driven as opposed to procedural. in an async world, line 2 cannot depend on line 1, because line 1 might be taking awhile. and all the program wants to do is get to the end of the file!

so in an async script, the program would idle most of the time, waiting for things to happen, before receiving an event and continuing what it needs to do.

it's a strange idea, but it's one i'll illustrate, and hopefully you'll understand. we need to actually install it, first.

download + compile it!

node.js is something you should download and compile to get it to run the best way possible. i hinted about this kind of stuff in the linux guide but now we get to do this for real if you never have! luckily, it's very easy to compile.

so have your linux server ready! we need to install the following dependencies via apt-get:

> apt-get install build-essential libssl-dev python

what are we installing?

build-essential allows you to run make to compile programs. libssl-dev allows SSL connections, should we need them. python just makes sure you have python installed, because node.js uses it a lot. (even though we won't.)

so once that's all done, change directory to /usr/src which is where we'll download the source files.

> cd /usr/src

now we download the node.js source files. go to the nodejs website and get the link for the Source Code download, it'll probably end in .tar.gz

now use wget in the linux shell to download it, like so:

> wget http://nodejs.org/dist/v0.6.10/node-v0.6.10.tar.gz

(Obviously replace the URL with whatever the latest download link is. In the following instructions, replace the .tar.gz filename accordingly.)

that might take a minute. it's using the wget client to get the stable-version source code of node.js. when it's done, we'll need to unpack it.

> tar -zxf node-v0.6.10.tar.gz

when it's done, it'll have created a node-v0.6.10 directory within /usr/src, so let's go there.

> cd node-v0.6.10/

ok, now we need to allow the proper configuration. luckily this is a simple command:

> ./configure

the dot-slash before the word "configure" means run the program in this directory named "configure"... instead of any possible program named configure that might be on our linux box.

that'll take just a few seconds. next, if the configure program checks out, we'll make the program.

> make

oh lawds, this'll take awhile. make yourself some tea. welcome to compiling programs. you'll have to feel this pain if you ever get into C or C++ or Objective-C

when it's done, run the following:

> make install

AND BAM WE GOT OURSELVES SOME NODE.

test it by doing this:

> node -v

that should print out the version number of node.js that you just installed. (at the time of this writing, it should be 0.6.10.) so fresh. that's it, you have node installed.

it'll also have installed npm, the node package manager, which works a lot like apt. we'll get to that later.

better than a hello world

so let's flex some muscles. i can't wait to show you! node.js is actually quite simple because it's just javascript! object-oriented, fast, with lots of capabilities.

to make a node.js program, all you need to do is start a blank text file, like most any programming language. i use the .js extension, because it's javascript!

so let's make omg.js -- because it'll blow your mind. the contents are as follows:

var http = require('http');

var server = http.createServer(function (request, response) {
  response.writeHead(200, {'Content-Type': 'text/html'});
  response.end('Hello World\n');
});

server.listen(1337);

console.log('Server running at http://127.0.0.1:1337/');

So write that in nano or vi or whatever, save it in a folder somewhere, go to that folder, and then type this:

> node omg.js

You should see the line Server running at http://127.0.0.1:1337/ and nothing else. There is no prompt because node.js is running the file...

If you replace 127.0.0.1 with the server's IP, and open it in a browser, you'll see "Hello world" in the browser!

That's because you just built your own web server. Yeah. Like Apache. Or lighttpd. YOUR OWN WEB SERVER FROM SCRATCH.

Anyway, that's pretty awesome, but maybe i'm just easily impressed. Nevertheless, let's break down what you just did.

First of all, you require() the module named http into the variable http... useful, right? Modules are a huge part of node.js, and I'll get into them more specifically in the next section. Needless to say, they act a lot like frameworks do in javascript. You're importing a bunch of helpful functions by requiring a module.

The next thing it does is run the createServer() function on the http object, which is then assigned to the server variable.

What's inside that call to createServer()? Well, that's where things get crazy. We put as the method's only argument an anonymous function. That is, a function without a name, to be used only when needed and then discarded. Within the async world, we call this a callback function. That means that the program will use this function whenever a specific event happens. It'll call back to that function using whatever you provide here. (It doesn't have to be an anonymous function, but why not?)

So the only argument is that function: in this case, the function takes two arguments, the first being an incoming request, and the second being the server's response. The event will be when the server gets an incoming request. It'll then call back on the provided function.

Within the function, we use the writeHead() method of our response to write in the response header that it has a status code of 200 (which, within HTTP, means "Okay! Got it!") and that it has the content-type of "text/html".

(The second argument to writeHead() is an object containing whatever headers you want. Personally, I always include a "lol" header, like so: { 'content-type': 'text/html', 'lol': 'wut' } And only about 0.00001% of internet users would ever notice. If you don't know what this means, forget I said anything.)

Then we end() the response with the text we want it to contain. (A simple way of explaining it.) The end() method also tells the server to send this response back to whoever asked, which would be you in your browser.

After we define that variable server successfully, we use the listen() method to tell it to listen on a specific port, in this case 1337.

And then we tell it to write a string to the console using the log() method, which just accepts a string to print out. That's it!

So here's how the process works, as we just set it up:

  1. listen on port 1337 for HTTP requests (those typically come from web browsers, notice the http:// before the URL)
  2. when the server gets a request, accept it and start writing a response!
  3. in the header of the HTTP response, put the status code 200, and some other custom header options
  4. in the body of the HTTP response, put the text Hello world, by adding it to the end() method
  5. with the end() method, send the server's response to whoever asked!

the program will keep running because it'll keep listening on that port. node.js will only stop if it runs out of things to do or it errors out. hit CTRL+C to force-quit the program.

even cooler is that it's not like it can only do one request at a time... no, it can do as many as the computer's CPU is capable of doing. again, this is non-blocking, so it'll try to do as many requests as possible! it doesn't do one, and if another shows up in the middle of that, wait til the first one is gone before moving on to the new one. it just does as many as it can concurrently as quickly as it can.

Now.... it's important to note the following: even though the console.log() call is the last thing in the script, it's not really what is run last. technically it's probably run second. remember, node.js will try to scream through whatever is there. in the next example, you'll see what i mean.

and yes, I just pulled a page worth of notes from less than 10 lines of code. that's how cool this is. it's very, very powerful.

a simpler example

the above example was very powerful and probably overwhelming. that's okay; but it should also be exciting! that kind of power, with that kind of simplicity? Amazing! that above example is a version of the basic example the guy who made node.js commonly uses to demonstrate how simple and yet very powerful node.js can be. if you were to run benchmarks on that simple web server against a service like Apache, node.js beats it every time.

but let's try something easier. something simpler. call this funtimes.js

console.log('welp. line one.');

setTimeout(function() {
  console.log('line two?');
}, 1000);

console.log('line three!');

okay, run that. you'll notice that it will print line one and three immediately, then in a second, you'll see line two. and then it'll quit. WHAT IS GOING ON HERE.

really, this is a lot like javascript already works -- because even in the browser, javascript is already capable of asynchronous-ness. put that on the server-side and things get interesting.

So what is the script really doing? It's trying to get to the end of the file as fast as possible! It prints the first string, then it sees the setTimeout() and acknowledges it, and then it prints the third string.

Then it just hangs out. Does nothing. The program idles. CPU usage goes to 0%. For one second, then it prints out the second line. The script exits, having nothing left to do.

If you change the setTimeout to setInterval, the second line will repeat every second. (1000 milliseconds.) and the program won't quit til you tell it to.

What is important about this is that nothing was blocked and no CPU cycles were wasted. There wasn't a second-long pause between the first and third printed lines. That's pretty awesome.

but, as I said, it means sometimes you can't count on whatever line 1 is doing before you do something on line 2. instead we have to use callback functions to make sure things go according to a certain sequence, as I'll explain in another example later.

modules

node.js has a lot of optional modules out there, the same way PHP has libraries and javascript has frameworks. (notice a theme? really every programming language has modules and libraries and frameworks.)

it comes with a lot already, read about them here. we used the HTTP module in the first example. node.js does a lot of good stuff already, but it can do even more.

to install new modules made by other people, there's a really awesome program called npm that was built. it stands for Node.js Package Manager. since version 0.6 (i think), node.js comes with npm by default.

so you should already have the npm program installed, check it by running this in the shell:

> npm -v

now if we want to install a node.js module, we'd just run it like this (don't actually do this, just look):

> npm install mongolian

that'll install the mongolian module into the current directory for node.js to use.

(just so you know, mongolian is a MongoDB interface for node.js)

what to do, what to do

node.js has a lot of potential because it's a tasty mix of low-level possibilities with high-level concepts built on top of an easy programming language.

that having been said, what can you do with it? here's some things i've done with it. not to boast, but to give you some ideas:

and i'm prototyping more cool stuff all the time. and that's with only about two or three months experience. really, node.js is great for anything that is event driven and/or heavy on I/O. (I/O is input/output, or the inputting of data and the outputting of data. Usually a blocking language will take awhile because it verifies every action before proceeding; non-blocking languages do not.)

for example, an IRC server or client made with node.js is perfect, because it'll idle while nothing is going on, and it'll respond when something actually happens, and it can handle lots of simultaneous messages at once if need be.

likewise, anything dealing with real-time data that needs to scale quickly, like webpage analytic data, is great with node.js because it's non-blocking I/O. what that means is, it'll quickly hand off info to a database without waiting for an "OK" in return. and it'll keep going indefinitely, only limited by CPU and memory, and if the database runs out of disk space.

but due to popular demand, i'll use the rest of this guide to show you an IRC bot as an example.

an IRC bot!?

yeah, every cool kid has written their own IRC bot. well, maybe just me and a few others. but never has it been easier than with node.js. and yeah, for this example to really work, you need an IRC server to test it on. but there are lots.

what will our bot do? we'll start off pretty simple

first, we'll be using the jerk module to power the actual IRC-ness of it.

jerk will handle pretty much all of the heavy lifting for you. it'll connect to the server, join a channel, and let you set up autoresponders.

first, go to whatever directory will hold our script. i like to use /root or i make a directory in the root of the filesystem called node, so the directory would be /node

once you're in here, use npm to install jerk:

> npm install jerk

that'll install the jerk module to whatever directory you're in. let's start the actual program file. use whatever text editor you want to start bot.js

first thing in this script should be loading the jerk module:

var jerk = require('jerk');

next, let's set up the options that jerk will need to connect to whatever you want:

var options = { 
  server: 'irc.whatever.net', 
  nick: 'SuperAwesomeBot', 
  channels: [ '#acoolchan90210' ] 
}

So the options variable is an object with three keys: server, nick, and channels. server is the IRC server it'll connect to. nick is the name of the bot! So name it something good. Finally, the channels key lets you assign an array of channels the bot will join once it's connected to the server. For our purposes, we only need one. Name it whatever, just remember it. (And in case you've never been on IRC before, channel names need a # at the beginning.)

Now, the same way we'd use setInterval() or whatever, we're going to use the jerk() function to set up some responders (also known as watchers or listeners).

var bot = jerk( function(j) {
  // the listeners will go in here
});

That's great. We'll come back to this in a minute. Now add this below that chunk:

bot.connect(options);

That's it! Now if you run that, like so:

> node bot.js

You'll see it connect to the server and join the channel. If you go in to the server and join the channel yourself using an IRC client, you'll see your bot sitting there, idling. You haven't taught it what to listen for yet, but at least it made it into the channel!

So CTRL+C the bot (it'll leave the channel) and let's go back to that jerk() function and define some listeners within it.

var bot = jerk( function(j) {
  
  j.watch_for('hello', function(message) {
    message.say('WHY HELLO THERE!');
  });
  
});

If you restart the bot and enter the chatroom with it, say hello! It'll respond automatically! Magic.

We just attached a listener to the jerk object using the watch_for() method. The first argument of this method is what to listen for, the second argument is a function containing what to do.

In this case, we want to use the message object's say() method to respond in the chatroom! We pass that method a string. Specifically, "WHY HELLO THERE!" Very loud and obnoxious, I love it.

What else can we do with this? Well, the message object also has a user property. With it, we can print out who sent the message, like so:

var bot = jerk( function(j) {
  
  j.watch_for('hello', function(message) {
    message.say('WHY HELLO THERE ' + message.user + '!');
  });
  
});

See what that does? It inserts the message's user name in the middle of the string. So if a user named frankie said "hello", the bot would respond

WHY HELLO THERE frankie!

That's the basics of an IRC bot. They can be very simple and annoying. You can just keep adding more responders:

var bot = jerk( function(j) {
  
  j.watch_for('hello', function(message) {
    message.say('WHY HELLO THERE ' + message.user + '!');
  });
  
  j.watch_for('goodbye', function(message) {
    message.say('lol bye');
  });
  
});

Now... you can get a lot crazier with it. Instead of a string as the first argument of watch_for(), you could use a regular expression... which makes it 1000 times more powerful. You can also capture data with your regex and use it in the returning message! I hadn't planned on writing a guide on regular expressions, but maybe I will.

You can easily build in logic conditions with if statements, so that the bot responds differently to different people. Most everything you can do with regular Javascript, as well as any node.js module, you can combine with this! You could add a MongoDB database to log every line in chat, or keep track of a certain person, or whatever.

For example...

var bot = jerk( function(j) {
  
  j.watch_for(/(.*)/, function(message) {
    if (message.user.toLowerCase() == 'cyle') {
      var what_cyle_said = message.match_data[0];
      chatlog.insert( { 'username': 'cyle', 'when': new Date, 'message': what_cyle_said } );
    }
    
  });
  
});

That tracks anything that is said in chat (thanks to the /(.*)/ regular expression) and does a little MongoDB insert (via the Mongolian node.js module, which I won't show you how to set up (you should figure that out!)) to store whatever the user named "cyle" says.

But that's just a taste... my chat bot, affectionately known as cylebot, can do a lot of neat tricks... someday I may release the source code for him.

life gets nested inside life

you can't really tell very much in these examples, but coding can start to get kind of tedious when you need to start making programs that act halfway between procedurally and asynchronously.

for example, what if you actually do need to make sure that a row gets into a database before you move on? it's simple, add a callback function to wait until the database tells you it's cool.

but then you need to move whatever you want to do when that happens to the inside of that callback! here's a simple example you can try out.

here we want our program to print something, wait a second, print something else, and then wait another two seconds, print something out, and then wait again.

console.log('this is the first line!');
setTimeout(function() {
  console.log('this is the second line!');
  setTimeout(function() {
    console.log('this is the third line!');
    setTimeout(function() {
      console.log('here is another line!');
      // now whatever else we want to do before the program exits would go here.
    }, 1000);
  }, 2000);
}, 1000);

See how we now have a weird stack of callbacks, one after another? this is called nested programming. it's kind of an eyesore. but we could also do it this way:

console.log('this is the first line!');

function lineTwo() {
  console.log('this is the second line!');
  setTimeout(lineThree, 2000);
}

function lineThree() {
  console.log('this is the third line');
  setTimeout(lineFour, 1000);
}

function lineFour() {
  console.log('here is another line!');
  // now whatever else we want to do before the program exits would go here.
}

setTimeout(lineTwo, 1000);

But that's kind of just as messy... we now have a breadcrumb trail of functions and setTimeout() calls. Really, there is no solution to this, because it's just the way asynchronous callback-based programming works!

So yeah... it takes some getting used to. Honestly I don't know how to do it differently yet.

what else can you do?

My second project I did after a simple chat bot was to make a video transcoding farm. I won't show you how to make one here, but I'll describe it to you, and tell you why node.js was so useful for it.

First, node.js can handle other processes on a computer really well. It can monitor a process easily with very little overhead. This is very useful when opening a process like the HandBrake CLI and having to monitor every line it outputs while it transcodes a file. (There are even node.js solutions already that can transcode a file as you're uploading it. That's so goddamn neat.)

For that reason, node.js can automate processes on a linux box quite easily. It has great modules for Processes and the File System. The most important part is that they're non-blocking, so one single node.js program can be handling lots of simultaneous jobs discreetly and efficiently. And when it has nothing to do, it'll idle completely, waiting in silence.

Admittedly, I needed to make a mothership server to act as the actual queueing system, but node.js handles all the actual heavy lifting of transcoding files on the farmers. (I could've written every single piece in node.js, though.) I'll get into that more in a later guide, specifically how to make individual systems talk to each other and interoperate efficiently.

But really, honestly, the possibilities for node.js are wide open, as the creator of node.js describes. Node.js is pretty young, but already there are so many modules for it. Honestly, I looked through that list of modules one day and picked out two or three to base projects on, just for kicks. You should do the same.

Possible projects:

there's much more you can do. just remember, it's just Javascript!

one last note

use the forever node.js module. it's priceless. what does it do? it makes sure your node script runs forever!

really though, you probably noticed that when you run a node.js script, it hangs up your prompt. you can't do anything else until you CTRL+C your node program. forever fixes this very elegantly.

read the installation and usage instructions right here

what it does, put very simply, is turn this:

> node bot.js

which you'd need to either leave running or CTRL+C to quit out of, into this:

> forever bot.js

which then makes bot.js run in the background forever, and keeps a log of it, and gives you back your prompt. amazing!

have fun!