Tuesday, December 16, 2014

Lambda, Javascript Micro-Services on AWS

Amazon just released a bunch of new services. My favorite is Lambda. Lambda allows me to deploy simple micro-services without having to setup any servers at all. Everything is hosted in the AWS cloud. Another cool thing about Lambda services is that the default runtime is Node.js!

To get access to AWS Lambda, you have to sign in to the [AWS Console] and select the Lambda service. You have to fill out a form to request access, which may take a while to come through. Once you have access you can edit the functions in a web form.

A lambda service is a Node module which exports an object with one function, the handler. In the AWS examples this is usually called handler and I'm going to follow their example.

Here is a simple function that can be edited and invoked in the online Lambda Edit/Test tool.

// hello-event.js
exports.handler = function(event, context) {
  console.log('Hello', event);
  context.done(null, 'Success');
}

The event is any JSON object and since a String is a valid object it can be invoked with "Tapir", which results in the following output in Lambda tool.

Logs
----
START RequestId: 3e21d80e-7e31-11e4-912c-2f870de05098
2014-12-07T16:51:47.163Z 3e21d80e-7e31-11e4-912c-2f870de05098 Hello Tapir
END RequestId: 3e21d80e-7e31-11e4-912c-2f870de05098
REPORT RequestId: 3e21d80e-7e31-11e4-912c-2f870de05098 Duration: 3.89 ms Billed Duration: 100 ms  Memory Size: 128 MB Max Memory Used: 9 MB
Message
-------
Success

Working in the Lambda online tool is sufficient for simple examples examples but quickly gets annoying and once you need to add extra modules, you have to upload zip-archives and this is both error prone and tedious. Here is a simple script to zip relevant files and upload them to Lambda. Make sure to update the region and the role to your own specific properties.

#!/bin/bash
#
# upload-lambda.sh
# Zip and upload lambda function
#

program=`basename $0`

set -o errexit

function usage() {
  echo "Usage: $program <function.js>"
}

if [ $# -lt 1 ]
then
  echo 'Missing required parameters'
  usage
  exit 1
fi

main=${1%.js}
file="./${main}.js"
zip="./${main}.zip"

role='arn:aws:iam::638281126589:role/lambda_exec_role'
region='eu-west-1'

zip_package() {
  zip -r $zip $file lib node_modules
}

upload_package() {
  aws lambda upload-function \
     --region $region \
     --role $role\
     --function-name $main  \
     --function-zip $zip \
     --mode event \
     --handler $main.handler \
     --runtime nodejs \
     --debug \
     --timeout 10 \
     --memory-size 128
}

# main
zip_package
upload_package

A Larger Example

Now that I know the Lambda works it is time to try out something more elaborate. I have read that it is not only possible to get access to npm modules but I also have access to the operating system when writing my service.

My bigger example consists of something I often have use for, a way to serve media files so that I don't have to check them into git. The way I want to do this is to upload a tarball to S3 and then have Lambda unpack the archive, checksum the files and upload them into another bucket.

Something like this:

  • React to the ObjectCreated:Put event
  • Download the tarball from S3
  • Extract tarball into temp directory
  • Checksum the files and rename them with the checksum
  • Upload the checksummed file to another S3 bucket
  • Upload an index of the files with mapping from old to new filename.

React to ObjectCreated:Put event

An AWS S3 ObjectCreated:Put event looks something like this in a trimmed down format

{
  "Records": [ {
      "eventVersion": "2.0",
      "eventSource": "aws:s3",
      "eventName": "ObjectCreated:Put",
      "s3": {
        "bucket": {
          "name": "anders-source",
        },
        "object": {
          "key": "tapirs.tgz",
          "size": 1024,
          "eTag": "d41d8cd98f00b204e9800998ecf8427e"
        }
      }
    }
  ]
}

To handle this event we need a handler function. All the handler needs to do is to extract the relevant properties from the file and then call assetify which will do the rest of the work. Breaking up the code like this allows me to use assetify locally and not only as a Lambda handler.

assetify.handler = function(event, context) {
    console.log('Received event:');
    console.log(JSON.stringify(event, null, '  '));

    var bucket = event.Records[0].s3.bucket.name;
    var key = event.Records[0].s3.object.key;
    assetify(bucket, key, function(err, result) {
        context.done(err, util.inspect(result));
    });
};

assetify

In order to use assetify as a normal module on a local machine I export the function with module.exports. This code needs to come before the assetify.handler declaration above. When exported this way, it is possible to require the function without involving Lambda.

function assetify(sourceBucket, key, callback) {
    var tgzRegex = new RegExp('\\.tgz');
    if (!key.match(tgzRegex)) return callback('no match');
    var prefix = path.basename(key, '.tgz');

    async.waterfall([
        downloadFile.bind(null, sourceBucket, key),
        extractTarBall,
        checksumFiles,
        uploadFiles.bind(null, prefix),
        uploadIndex.bind(null, prefix)
    ], function(err, result) {
        if (err) return callback(err);
        callback(null, result);
    });
}

module.exports = assetify;

I'm using async.waterfall in combination with bind to get a nice flat structure of the code which clearly resembles the described flow above.

Download file

The downloadFile function uses a nice feature of s3.getObject, streaming. After creating a temporary file with tmp.file, I create a request and then I stream the contents from S3 directly into a write stream. Very nice! I also need to hook up some event handler to allow me to notify the callback once the streaming is complete.

function downloadFile(sourceBucket, key, callback) {
    console.log('downloadFile', sourceBucket, key)
    tmp.file({postfix: '.tgz'}, function tmpCreated(err, tmpfile) {
        if (err) return callback(err);
        var awsRequest = s3.getObject({Bucket: sourceBucket, Key:key});
        awsRequest.on('success', function() {
            return callback(null, tmpfile);
        });
        awsRequest.on('error', function(response) {
            return callback(response.error);
        });
        var stream = fs.createWriteStream(tmpfile);
        awsRequest.createReadStream().pipe(stream);
    });
}

Extract tarball

In order to extract the tarball I'm using the ordinary tar command instead of relying on a Node module. This works fine as Lambda seems to include a full standard AWS distribution. Very nice to have access to all the common Unix utilities. The glob function makes it easy to traverse the full tree structure of the archive and I use this to return (or pass on via callback) a map of filenames to the temporary files.

function extractTarBall(tarfile, callback) {
    tmp.dir(function(err, dir) {
        if (err) return callback(err);
        var cmd = 'tar -xzf ' + tarfile + ' -C ' + dir;
        exec(cmd, function (err) {
            if (err) return callback(err);
            glob(dir + '**/*.*', function(err, files) {
                if (err) return callback(err);
                var fs = files.map(function(file) {
                    return {
                        path: file,
                        originalFile: file.replace(dir, '')
                    };
                });
                return callback(null, fs);
            });
        });
    });
}

Checksum

checksumFiles uses async.map to call the singular version checksumFile. This creates a checksum of the file and does some string manipulation in order to create a name with a checksum in it.

function checksumFiles(files, callback) {
    async.map(files, checksumFile, callback);
}

function checksumFile(file, callback) {
    checksum.file(file.path, { algorithm: 'md5'}, function(err, sum) {
        if (err) return callback(err);
        var filename = file.originalFile;
        var ext = path.extname(filename);
        var base = filename.replace(ext, '');
        var checksumFile = base + '-' + sum + ext;

        callback(null, {
            path: file.path,
            originalFile: file.originalFile,
            checksumFile: checksumFile
        });
    });
}

Upload files to S3

When the new filenames have been created the files can now be uploaded to S3 via s3.putObject. Unfortunately, putObject does not support pipe, but I can use a ReadStream as the value of the body object and this is good enough. It uses the mime module to calculate the content-type from the filename. After the file is uploaded an object with a mapping between the original name and the URL is returned.

function uploadFiles(prefix, files, callback) {
    console.log('uploadFiles', prefix, files)
    async.map(files, uploadFile.bind(null, prefix), callback);
}

function uploadFile(prefix, file, callback) {
    var stream = fs.createReadStream(file.path);
    var s3options = {
        Bucket: config.bucket,
        Key: prefix + file.checksumFile,
        Body: stream,
        ContentType: mime.lookup(file.path)
    };
    s3.putObject(s3options, function(err, data) {
        if (err) return callback(err);
        console.log('Object added', s3options);
        callback(null, {
            originalFile: file.originalFile,
            url: config.url + config.bucket + '/' + prefix + file.checksumFile
        });
    });
}

Upload the index

The last thing to is to upload the index with the filename-to-URL map as a JSON-file. This is done in a similar way as the upload of the images.

function uploadIndex(prefix, files, callback) {
    var s3options = {
        Bucket: config.bucket,
        Key: prefix + '/index.json',
        Body: JSON.stringify(files),
        ContentType: 'application/json'
    };

    s3.putObject(s3options, function(err, data) {
        if (err) return callback(err);
        console.log('Object added', s3options.Key);
        callback(null, {
            files: files,
            url: config.url + config.bucket + '/' + prefix + '/index.json'
        });
    });

}

The final index.json file loooks something like this.

[{
  originalFile: "/Tapir_standing_profile.jpg",
  url: "https://s3-eu-west-1.amazonaws.com/anders-dest/tapirs/Tapir_standing_profile-624bd0ac55d5140a78a2ea9d1409e2f6.jpg"
},
{
  originalFile: "/tapir-sticker.png",
  url: "https://s3-eu-west-1.amazonaws.com/anders-dest/tapirs/tapir-sticker-8522f4228bbc995d73ee1ead9d5e8e4f.png"
},
{
  originalFile: "/tapir.jpg",
  url: "https://s3-eu-west-1.amazonaws.com/anders-dest/tapirs/tapir-eb09705a33f6c6896def4e452fa77272.jpg"
}]

Summary

Lambda is very simple to work with and it allows me to create small services that react to events without the need to setup any servers at all.

Apart from the integration with S3, it also integrates with Kinesis and with DynamoDB allowing for very cool application to built.

Saturday, September 27, 2014

Fallacies and Biases of our Imperfect Mind

Our mind is the most advanced computer we know about. It can perform tremendous feats. Yet, it is fooling us a lot more than most of us would care to admit. The reason for this is that the mind takes shortcuts to save energy and speed up our thinking.

In this article I will present how science now believes that the brain works, the problems it has, and suggestions about what to do about it.

Our Incredible Mind

Imagine you are riding a bicycle into an intersection. Cars, motorcycles, mopeds and other bikes are coming from all directions. Your brain takes in the whole scene and makes instantaneous decisions about what route to take. You communicate both consciously and unconsciously with the other drivers and you cross the intersection as if it was nothing.

This is an example of what our incredible mind can do. But, in order to do this it takes shortcuts and these shortcuts are not always appropriate. The rest of this article will discuss the problems that occur when the shortcuts are not to our advantage.

Belief

What is belief? Why do we believe the things we do? What do we truly know? When we start to really analyze our beliefs we often realize that we don't know why we believe in something, we just do. And, we may also know that something is not correct but still act as if it is.

Can you get a cold from being cold?

No! The only way to get a cold is by being exposed to the cold virus. If you catch a cold after being cold it is only a coincidence. Yet, many of us tell our children to dress warmly to avoid getting a cold!

Perception

We think that our perception is infallible. We think we see what is real! This is not the case, our senses are easily fooled and also affected by what we expect to experience.

Shadow Illusion

Which one of A and B is lighter?

It is a trick question, they are both the same color as we see in this picture. Yet, even when we know this, it is impossible to see!

Pattern Illusion

Can you see anything in this image? Can you see the dalmatian?

If we draw the contour of the dalmatian it becomes obvious. But now, if you look at the above picture. Can you not see the dalmatian? Our perception is influenced by what we expect to see.

Attention Test

Watch this film and try to count the passes made by the white-dressed basketball players.

Did you get the count right? Did you see the gorilla? In the original study about half of the people that were shown this film didn't see the gorilla! Being focused on one thing can make us completely miss another.

This happens to us all the time in real life. People look at the same situation and interpret it completely differently.

Memory

We believe that we remember things as they actually were, but in reality our memories are reconstructed every time we remember something. We fill in new details.

Source and Truth Amnesia

We have a tendency to forget the source and the truthiness about facts that we know. We remember the facts, but we don't know where they come from or if they are true or not!

We may have heard about a correlation between vaccines and autism. But, we forgot, the minor detail, that there is a not even a very weak correlation between them. Hence, we refuse to vaccinate our kids since we don't want them to become autistic.

Vivid Memories

Vivid memories, memories involving strong feelings, makes us remember things more strongly. It makes us more confident about our memories being correct. Just because the memories are stronger does not mean that they are more correct. We simply believe in them more.

Memory Fusion

Memories also fuse together to form new composite memories, that may not resemble what really happened at all. Do you remember your tenth birthday or do you remember what your mom told you or what you have seen in pictures?

Fake Memories

We cannot tell if our memories are fake or if they really happened. Everything we remember seems real to us!

Pattern Recognition

Humans are also very good at pattern recognition. This allow us to detect and categorize people, animals, and things. But, our pattern recognition also shows us things that are not there. Was there really a dalmatian in the spotted picture above?

Agent Detection

Agent detection is an inclination for humans and animals to detect an intelligent agent in situations that may or may not involve one. We see and hear things that aren't there.

We detect a bush blowing in the wind as a person hiding. We see a rope lying on the trail as a dangerous snake.

Confabulated Consciousness

Our mind processes our perceptions and memories and creates our reality into a coherent story. The story need not be correct, it must only be consistent. In order to keep the story consistent our mind makes up the details it needs to.

In a study of split-brain patients, the patients were shown images. One image per eye. The split-brain condition prevents the two parts of the brain from communicating with each other.

In the depicted example, the patient was shown two images: one eye was shown a chicken foot, the other eye was shown a snowy landscape. The patient then had to pick a related image from a number of other pictures. The patient in the study picked a hen and a snow-shovel with each hand respectively.

When asked why he picked the images, his verbal side of the brain answered. "I picked the hen because I saw a chicken's foot and I picked the shovel because I need a shovel to clean out the hen house."

His mind made up story that was consistent with why he had a shovel in his other hand.

Our mind can make things up to make our life story consistent!

Bias

A bias is a prejudice. A cognitive bias is a type of error in thinking that occurs when we are processing and interpreting information in the world around us.

Cognitive biases are often a result of our attempt to simplify information processing. They are rules of thumb that help us make sense of the world and reach decisions with relative speed.

Unfortunately, these biases sometimes trip us up, leading to poor decisions and bad judgments.

Anchoring

The anchoring effect describes the human tendency to rely to heavily on the first piece of information offered, the anchor, when making decisions.

If I ask a group of people "If more or less than 20 percent of the mammals have four legs?" and then ask the same group to guess the specific percentage of animals that have four legs. I commonly get a lower percentage than if I initially had asked "If more of less than 80 percent of the mammals have four legs?".

We anchor to the number presented to us. This is the same technique that is used by salesmen when they offer you a good deal of only 20 thousand dollars for the second-hand Volvo.

Availability Heuristic

How many percent of the population do you think are allergic to gluten? How do you go about making such an estimation? What I often do is to think about the people around me. How many of them are allergic to gluten? It seems like quite a lot. I would guess about 10 percent of the people I know are allergic, so that is my reply.

This is the availability heuristic at work. Why should my tiny number of acquaintances have anything to do with the rest of the population in the world? But, this information is readily available to me and it is easier for me to just guess from this information than to think through the problem thoroughly.

Fundamental Attribution Error

Say you are walking in the street and stumble and fall. The common way we react to this is that we make up an excuse to why we fell, a hole in the pavement, etc. It is not my fault, there was a hole in the pavement. Perhaps, we even get angry, someone should really fix that!

If someone else stumbles and falls in the same spot, we readily label that person as being clumsy or careless.

We attribute our mistakes to external causes and other's mistakes to the person. We also give ourselves credit for good things we do, but other people's good deeds we attribute to luck or coincidence. This is the fundamental attribution error.

Hindsight Bias

Hindsight bias is also known as the "I-knew-it-all-along" effect. It is the tendency to see past events as being predictable at the time those events happened. (This picture does not really convey this bias, as the outcome can probably be predicted beforehand :)

An example of this is the 9/11 bombings, when the event had happened it was easy to find clues that informed about a coming attack. Clues like this exist all the time for things that never happen, but we don't focus on those because they are not relevant.

Confirmation Bias

This is the mother of all biases! A bias that we, all of us, fall into every day. It is the tendency to search for or interpret information in a way that confirms our beliefs. Or, to notice events that confirms our beliefs while ignoring events that disconfirms them.

Do I put the seat down when I have been on the toilet? All the time, I say. Never, my wife says. How can this be? How can I and my wife come to completely different conclusions from the same data?

The reason is that I notice the times when I remember to put the seat down, since I have to think about this and therefore remember it. I don't remember the times when I don't do it since, I don't even notice them.

For my wife it is the absolute opposite, she only notices when I forget to do it and doesn't notice when I do.

When we read an article that we agree with, it is easy to think, "Yes, that is the way it is!" and move on. If we read an article that we don't agree with, we can go to great lengths to examine the "erroneous" arguments to disconfirm them.

Innumeracy

The human mind is really bad at working with large numbers and probability.

Gambler's Fallacy

The tendency to think that future probabilities are altered by past events, when in reality they are unchanged.

Flip a coin ten times in a row and it turns out tails every time? How likely is it that we will flop a heads the next time. The answer is, of course, 50%. In this scenario most of us will know this is correct, but in many other scenarios we tend to think that the other option is due and hence calculate it as more likely to occur.

Lottery Fallacy

What is the odds of one person winning a lottery? Not very high, maybe one-in-a-billion, depending on which lottery it is. But often times this is not the real question to ask ourselves. We should often ask: What is the odds of anyone winning the lottery? It turns out that the odds for this are, usually, pretty good.

Imagine you dream that someone dies. When you wake up the next day it has really happened. What are the odds of this happening to you? I must be a miracle. No! The correct question is: What are the adds of this happening to anyone?

Base Rate Neglect

John is a man who wears Gothic inspired clothing, has long black hair, and listens to death metal. How likely is it that he is a Christian and how likely is it that he is a Satanist?

We have a tendency to answer that it is more likely that he is a Satanist. But, this ignores the base rate. The fact that there are 2 billion Christians and only, maybe, 2 million Satanists. With that base rate in place, it is much more likely that John is a Christian who likes wearing Gothic clothing, has long black hair and listens to death metal.

Clustering Illusion

This is the tendency to overestimate the importance of small runs, streaks, or clusters in large samples of random data.

The clustering illusion explains the "hot-hand" in basketball. The hot-hand is the belief that a player who has made a few baskets is more likely to make the next basket since he is on-a-roll.

Probability

Imagine a disease that 1% of the population has. Assume there is a test with 99% certainty of being correct. 1% false positives and 1% false negatives.

What is the probability that you have the disease if after taking the test it shows positive?

Our natural inclination to answer this question is, "Bloody sure!". But, in reality the probability of us having the disease is only 50%. Google it, if you don't believe it!

So What?

So, we believe things, but we don't know why. Our perception is severely influenced by what we already believe. Our memories are flawed. We see patterns and agents that don't exist. So what? This doesn't apply to us anyway, right?

It turns out that it does. Smarter people are better at rationalizing their beliefs than other's. We still make the same mistakes, but we are better at coming up with credible explanations as to why it is not a rationalization.

Skepticism

"I doubt it!" is not only a proper response to what other people say. It is also an appropriate response to our own thought and ideas.

Scientific skepticism holds that science is the best way to find out things about the world and ourselves. Scientific skeptics don't trust claims made by people who reject science or who don't think that science is the best way to learn about the world.

Scientific skeptics don't say that all extraordinary claims are false. A claim isn't false just because it hasn't been proven true.

It's possible pigs can fly, but until we see the evidence we shouldn't give up what science has taught us about pigs and flying.

Meta Cognition

Thinking about thinking! When you learn new facts, be aware of all the fallacies and biases mentioned in this article. This will help prevent you from making some mistakes.

Bias Blind Spot

The bias blind spot is the cognitive bias of failing to compensate for one's own cognitive biases. Even if we know everything I've written about here, we have a tendency to underestimate our potential for self-deception. To see ourselves as rational beings is the greatest self-deception of all.

Richard Feynman


The first principle is that you must not fool yourself -
and you are the easiest person to fool.  
-- Richard Feynman


References

Sunday, May 11, 2014

Ping-Pong Pairing Over Git

When practicing new programming techniques I am a fan of ping-pong pairing. Ping-pong pairing is a way of pairing with TDD that evenly distributes the amount each programmer spends in front of the keyboard and the amount of test code versus actual code each programmer writes.

The description from the C2 Wiki reads:

Pair Programming Ping Pong Pattern

* A writes a new test and sees that it fails.
* B implements the code needed to pass the test.
* B writes the next test and sees that it fails.
* A implements the code needed to pass the test.

And so on. Refactoring is done whenever the need arises by whoever is driving.

Two programmers sit in from of one computer, with one keyboard and one screen.

Ping-pong pairing is great in a learning context because it keeps both programmers in front of the keyboard and it encourages conversation.

But, there are a couple of problems. Keyboards and editors! What if one programmer uses Dvorak and the other Qwerty? Or, one programmer cannot even think of writing another line of code without Das Keyboard while the other prefers an ergonomic keyboard? Or, one programmer uses Vim on OSX and the other Notepad on Windows?

Ping-Pong Pairing Over Git

What if we alter the setup to give each user their own computer, keyboard, screen, rubber duck, or whatever tickles their fancy? It would seem that this isn't paring any more! But, if we place the pair side-by-side and let them communicate over Git, we actually get a very nice flow. There is still only one person typing on the keyboard at a time, but they are typing on their own keyboard.

The pattern above is changed to:

Ping Pong Paring Over Git Pattern

* A writes a new test and sees that it fails.
* A commits the failing test and pushes the code to Git
* B pulls the code from Git.
* B implements the code needed to pass the test.
* B writes the next test and sees that it fails.
* B commits the failing test and pushes the code to Git
* A pulls the code from Git.
* A implements the code needed to pass the test.

And so on.

I've tried this pattern on a couple of code retreats and it is actually pretty smooth. To make it even more smooth I implemented a simple command line utility, tapir that allows for simple communication between the two computers and automates the pulling of the new code. It works like this. Each programmer starts a listener on their machine that pulls the code when it receives a message.

# Start a listener on 'my-topic', run git pull when a message arrives
tapir --cmd listen --script 'git pull' mytopic

Write a simple script to combine git push with a calling tapir mytopic

#!/bin/sh
# ./push script pushes code to git and pings the tapir-server
git push
tapir mytopic

Now, instead of calling git push, you call ./push and the code is automatically pulled on the other machine, eliminating one step from the loop.

Summary

Ping-pong pairing over Git is nice! If you are interested in trying it out I have a roman numerals kata with setup code for multiple languages, currently: Clojure, ClojureScript, Javascript, Lua, VimScript, Objective-C, PHP, Ruby and C.

The tapir command line utility is also pretty interesting as it uses ServerSentEvents to communicate over a standard http server, http://tapir-server.herokuapp.com/

Why is the utility called tapir? Because, pingpong and ping-pong were already taken and I like tapirs! :).

Tuesday, March 25, 2014

Running Scripts with npm

Most people are aware that is is possible to define scripts in package.json which can be run with npm start or npm test, but npm scripts can do a lot more than simply start servers and run tests.

Here is a typical package.json configuration.

// package.json
// Define start and test targets
{
  "name": "death-clock",
  "version": "1.0.0",
  "scripts": {
    "start": "node server.js",
    "test": "mocha --reporter spec test"
  },
  "devDependencies": {
    "mocha": "^1.17.1"
  }
}
// I am using comments in JSON files for clarity.
// Comments won't work in real JSON files.

start, actually defaults to node server.js, so the above declaration is redundant. In order for the test command to work with mocha, I also need to include it in the devDependencies section (it works in the dependencies section also, but since it is not needed in production it is better to declare it here).

The reason the above test command, mocha --reporter spec test, works is because npm looks for binaries inside node_modules/.bin and when mocha was installed it installed mocha into this directory.

The code that describes what will be installed into the bin directory is defined in mocha's package.json and it looks like this:

// Macha package.json
{
  "name": "mocha",
  ...
  "bin": {
    "mocha": "./bin/mocha",
    "_mocha": "./bin/_mocha"
  },
  ...
}

As we can see in the above declaration, mocha has two binaries, mocha and _mocha.

Many packages have a bin section, declaring scripts that can be called from npm similar to mocha. To find out what binaries we have in our project we can run ls node_modules/.bin

# Scripts availble in one of my projects
$ ls node_modules/.bin
_mocha      browserify  envify      jshint
jsx         lessc       lesswatcher mocha
nodemon     uglifyjs    watchify

Invoking Commands

Both start and test are special values and can be invoked directly.

# Run script declared by "start"
$ npm start
$ npm run start

# Run script declared by "test"
$ npm test
$ npm run test

All other values will have to be invoked by npm run. npm run is actually a shortcut of npm run-script.

{
  ...
  "scripts": {
    // watch-test starts a mocha watcher that listens for changes
    "watch-test": "mocha --watch --reporter spec test"
  },
}

The above code must be invoked with npm run watch-test, npm watch-test will fail.

Running Binaries Directly

All the above examples consists of running scripts that are declared in package.json but this is not required. Any of the commands in node_modules/.bin can be invoked with npm run. This means that I can invoke mocha by running npm run mocha directly instead of running it with mocha test.

Code Completion

With a lot of modules providing commands it can be difficult to remember what all of them are. Wouldn't it be nice if we could have some command completion to help us out? It turns out we can! npm follows the superb practice of providing its own command completion. By running the command npm completion we get a completion script that we can source to get completion for all the normal npm commands including completion for npm run. Awesome!

I usually put each of my completion script into their own file which I invoke from .bashrc.

# npm_completion.sh
. <(npm completion)

# Some output from one of my projects
$ npm run <tab>
nodemon                  browserify               build
build-js                 build-less               start
jshint                   test                     deploy
less                     uglify-js                express
mocha                    watch                    watch-js
watch-less               watch-server

Pretty cool!

Combining Commands

The above features gets us a long way but sometimes we want to do more than one thing at a time. It turns out that npm supports this too. npm runs the scripts by passing the line to sh. This allows us to combine commands just as we can do on the command line.

Piping

Lets say that I want to use browserify to pack my Javascript files into a bundle and then I want to minify the bundle with uglifyjs. I can do this by piping (|) the output from browserify into uglifyjs. Simple as pie!

  //package.json
  // Reactify tells browserify to handle facebooks extended React syntax
  "scripts": {
    "build-js": "browserify -t reactify app/js/main.js | uglifyjs -mc > static/bundle.js"
  },
  // Added the needed dependencies
  "devDependencies": {
    "browserify": "^3.14.0",
    "reactify": "^0.5.1",
    "uglify-js": "^2.4.8"
  }

Anding

Another use case for running commands is to run a command only if the previous command is successful. This is easily done with and (&&). Or (||), naturally, also works.

  "scripts": {
    // Run build-less if build-less succeeds
    "build": "npm run build-js && npm run build-less",
    ...
    "build-js": "browserify -t reactify app/js/main.js | uglifyjs -mc > static/bundle.js",
    "build-less": "lessc app/less/main.less static/main.css"
  }

Here I run two scripts declared in my package.json in combination with the command build. Running scripts from other scripts is different from running binaries, they have to prefixed with npm run.

Concurrent

Sometimes it is also nice to be able to run multiple commands at the concurrently. This is easily done by using &amp; to run them as background jobs.

  "scripts": {
    // Run watch-js, watch-less and watch-server concurrently
    "watch": "npm run watch-js & npm run watch-less & npm run watch-server",
    "watch-js": "watchify app/js/main.js -t reactify -o static/bundle.js -dv",
    "watch-less": "nodemon --watch app/less/*.less --ext less --exec 'npm run build-less'",
    "watch-server": "nodemon --ignore app --ignore static server.js"
  },
  // Add required dependencies
  "devDependencies": {
    "watchify": "^0.6.2",
    "nodemon": "^1.0.15"
  }

The above scripts contain a few interesting things. First of all watch uses &amp; to run three watch jobs concurrently. When the command is killed, by pressing Ctrl-C, all the jobs are killed, since they are all run with the same parent process.

watchify is a way to run browserify in watch mode. watch-server uses nodemon in the standard way and restarts the server whenever a relevant file has changed.

watch-less users nodemon in a less well-known way. It runs a script when any of the less-files changes and compiles them into CSS by running npm run build-less. Please note that the option --ext less is required for this to work. --exec is the option that allows nodemon to run external commands.

Complex Scripts

For more complex scripts I prefer to write them in Bash, but I usually include a declaration in package.json to run the command. Here, for example, is a small script that deploys the compiled assets to Heroku by adding them to a deploy branch and pushing that branch to Heroku.

#!/bin/bash

set -o errexit # Exit on error

git stash save -u 'Before deploy' # Stash all changes, including untracked files, before deploy
git checkout deploy
git merge master --no-edit # Merge in the master branch without prompting
npm run build # Generate the bundled Javascript and CSS
if $(git commit -am Deploy); then # Commit the changes, if any
  echo 'Changes Committed'
fi
git push heroku deploy:master # Deploy to Heroku
git checkout master # Checkout master again
git stash pop # And restore the changes

Add the script to package.json so that it can be run with npm run deploy.

  "scripts": {
    "deploy": "./bin/deploy.sh"
  },

Conclusion

npm is a lot more than a package manager for Node. By configuring it properly I can handle most of my scripting needs.

Configuring start and test also sets me up for integration with SaaS providers such as Heroku and TravisCI. Another good reason to do it.

Sunday, January 19, 2014

Clean Grunt

Grunt is the tool of choice for many client side web projects. But, often the gruntfiles look like a mess. I believe the reason for this is that many people don't care about keeping it clean.

On top of that, the file is often generated by a tool, such as Yeoman, and not cleaned up after. I happen to think that the gruntfile should be clean and here is a how to do it.

Here is how the project structure looks in development mode. I keep all my client side code in an app directory and I use Bower to install external components into app/components


app
    components
        jquery
            jquery.js
        momentjs
            moment.js
    images
        bower-logo.png
        grunt-logo.svg
    index.html
    scripts
        main.js
        model.js
        view.js
    styles
        images.css
        main.css

I will use less, watch, concat, uglify, filerev and usemin to optimize it and turn it into this.

dist
    app
        images
            bower-logo.fd05710aa2cb9502dc90.png
            grunt-logo.16c32bb187681923d5a7.svg
        index.html
        scripts
            main.359737238b7dc0972e52.js
        styles
            main.6873d02f25d2385b9ec8.css

The above structure is good because it serves one CSS file, one Javascript file, and everything apart from index.html is named with an MD5 checksum that allow me to cache everything infinitely!

Loading External Tasks

Loading tasks in Grunt is done with grunt.loadNpmTasks but since all dependenciies is already declared in package.json there is no need to name them again. So instead we use matchdep to load all Grunt dependencies automatically.

// Load all files starting with `grunt-`
require('matchdep').filterDev('grunt-*').forEach(grunt.loadNpmTasks);

The relevant section in package.json contains these files. All Grunt plugins follow the grunt- naming convention.

 "devDependencies": {
    "bower": "~1.2.8",
    "grunt": "~0.4.2",
    "matchdep": "",
    "grunt-contrib-jshint": "",
    "grunt-contrib-less": "",
    "grunt-contrib-copy": "",
    "grunt-contrib-clean": "",
    "grunt-contrib-watch": "",
    "grunt-express-server": "",
    "grunt-contrib-cssmin": "",
    "grunt-usemin": "",
    "grunt-filerev": "",
    "grunt-contrib-concat": "",
    "grunt-contrib-uglify": ""
  }
}

JsHint

I think it is a good idea to run JsHint for all my files including the Gruntfile and here is how I configure it.

// JsHint configuration is read from packages.json
var pkg = grunt.file.readJSON('package.json');

grunt.initConfig({
    pkg: pkg,

    // JsHint
    jshint: {
        options: pkg.jshintConfig,
        all: [
            'Gruntfile.js',
            'app/scripts/**/*.js',
            'test/**/*.js'
        ]
    }
}

Newer versions of JsHint can pick up configuration from package.json and I take advantage of this so I don't have a duplicate configuration in a .jshint file that is normally added when using a generated project.

The relevant section in package.json is defined like this:

 "jshintConfig": {
    ...
    "maxparams": 4,
    "maxdepth": 2,
    "maxcomplexity": 6,
    ...
  }

I truncated the section for brevity but I kept my favorite configuration options that deal with complexity and forces me to keep my code simple.

Less

As I wrote in CSS Good Practices, I think using a CSS preprocessor is a really good idea and I use Less in this project. Since Less is a superset of CSS all I have to do to use less is to change the extension from .css to .less and configure Grunt to convert Less files into CSS. In development mode I like to have the CSS files in the same place I would have put them if I wasn't using Less. To avoid accidentally checking the generated files into source control I add the following line to .gitignore

# .gitignore
app/styles/*.css

Here is the configuration for generating a CSS file. I add two targets, one for development and one for release which is compressed.

// Less
less: {
    dev: {
        src: 'app/styles/main.css',
        dest: 'app/styles/main.less'
    },
    release: {
        src: 'app/styles/main.css',
        dest: 'dist/app/styles/main.less',
        options: {
            compress: true
        }

    }
}

As you can see I only name one less file. I think it is a good idea to include all less files via import statements.

// Less files are automatically included and don't generate new requests.
@import 'other-less-file.less';

Watch

In development mode I also like to have a file watcher that generates the CSS files automatically when I change a less file. Here is the configuration.

// Watch
watch: {
    // watch:less invokes less:dev when less files change
    less: {
        files: ['app/styles/*.less'],
        tasks: ['less:dev']
    }
}

Clean

It is also a good idea to be able to remove generated files with one command clean will do that for me.

// Clean
clean: {
    // clean:release removes generated files
    release: [
        'dist',
        'app/styles/*.css'
    ]
}

Concat, Uglify and Usemin Prepare

To concatenate and minify the Javascript files, I use concat and uglify. But I don't want the files used in index.html to be automatically included. To do this I need to use useminPrepare. It is one of two tasks included in grunt-usemin, the other is unsuprisingly called usemin and I will describe it later.

useminPrepare parses HTML files, looking for tags that follow a distinct pattern, &lt;!-- build:js outputfile.js --&gt; and extracts the filenames from script tags. These files are then injected into the concat and uglify tasks. So, there is no need to provide a configuration for those tasks.

<!-- app/index.html -->

<!-- build:js scripts/main.js -->
<script src="components/jquery/jquery.js" defer></script>
<script src="components/momentjs/moment.js" defer></script>
<script src="scripts/model.js" defer></script>
<script src="scripts/view.js" defer></script>
<script src="scripts/main.js" defer></script>
<!-- endbuild -->
/// userminPrepare
useminPrepare: {
    html: 'app/index.html',
    options: {
        dest: 'dist/app'
    }
},

// Concat
concat: {
    options: {
        separator: ';'
    },
    // dist configuration is provided by useminPrepare
    dist: {}
},

// Uglify
uglify: {
    // dist configuration is provided by useminPrepare
    dist: {}
}

There are a few things that are noteworthy above. useminPrepare.options.dest works in conjunction with the value defined in the build:js comment in the html file. I always designate the root directory of the generated code in the Gruntfile and I keep the relative path to the file in the HTML file. I do this because this configuration is reused by the usemin task later and configuring it this way in useminPrepare keeps it simpler later.

Also note that concat and uglify needs to have an empty dist property. Otherwise, useminPrepare cannot inject configuration into it.

Running grunt useminPrepare shows the generated configuration.

concat:
{ options: { separator: ';' },
dist: {},
generated:
  { files:
    [ { dest: '.tmp/concat/scripts/main.js',
        src:
          [ 'app/components/momentjs/moment.js',
            'app/components/jquery/jquery.js',
            'app/scripts/model.js',
            'app/scripts/view.js',
            'app/scripts/main.js' ] } ] } }

uglify:
dist: {},
generated:
  { files:
    [ { dest: 'dist/app/scripts/main.js',
        src: [ '.tmp/concat/scripts/main.js' ] } ] } }

Alright, now we have minified both CSS and Javascript, it is time to move the files that don't need minification, images and html files.

Copy

// Copy HTML and fonts
copy: {
    // copy:release copies all html and image files to dist
    // preserving the structure
    release: {
        files: [
            {
                expand: true,
                cwd: 'app',
                src: [
                    'images/*.{png,gif,jpg,svg}',
                    '*.html'
                ],
                dest: 'dist/app'
            }
        ]
    }
}

Here I use a different configuration for the files. The expand option is what is important. I tells grunt to copy the files preserving the structure.

OK, now all the files have been moved into their proper place and all that is left is to checksum them and rename all the references.

Filerev, checksumming

filerev is my task of choice for adding the checksum of a file to its name. I use MD5 to checksum all assets, javascript, css and images with this configuration.

// Filerev
filerev: {
    options: {
        encoding: 'utf8',
        algorithm: 'md5',
        length: 20
    },
    release: {
        // filerev:release hashes(md5) all assets (images, js and css )
        // in dist directory
        files: [{
            src: [
                'dist/app/images/*.{png,gif,jpg,svg}',
                'dist/app/scripts/*.js',
                'dist/app/styles/*.css',
            ]
        }]
    }
}

Usemin

The final task is to change all the references in the HTML and CSS files to use the checksummed filenames and to change the script tags to reference the minified file. usemin is the task for this job.

// Usemin
// Replaces all assets with their revved version in html and css files.
// options.assetDirs contains the directories for finding the assets
// according to their relative paths
usemin: {
    html: ['dist/app/*.html'],
    css: ['dist/app/styles/*.css'],
    options: {
        assetsDirs: ['dist/app', 'dist/app/styles']
    }
}

The only difficult thing about this is that usemin uses the paths from the files it parses when it searches for assets to replace references to. This means that options.assetsDirs must designate the directories where the parsed files are located. In my case the CSS files are in dist/app/styles and the HTML files are in dist/app. Hoohaah! Only one more thing before were done. Calling all the tasks in order.

Release

I register the release task and tell it to invoke all the other files in the correct order.

// Invoked with grunt release, creates a release structure
grunt.registerTask('release', 'Creates a release in /dist', [
    'clean',
    'jshint',
    'less:release',
    'useminPrepare',
    'concat',
    'uglify',
    'copy',
    'filerev',
    'usemin'
]);

Example Code

This example comes from a workshop I give. If you are interested in one send me a note. If you would like to give one yourself you are welcome to use my example code. I also give a Grunt presentation

That's all folks!