Building a Scalable Web App with Node.js – Dos and Don’ts

If your project needs to be scalable you can rely on Node’s amazing ability to accomplish such a goal easily and without hassle. There are a couple important issues you’d need to tackle in order for it to succeed. They are, in no particular order: Load balancing, clustering, Websockets and what wrappers to use.

We’ll be talking about writing efficient back-end solutions here which require smart decisions in order to avoid oversight and wasted budgets/time. Especially considering the current trends in node.js development.

First it would be a good idea to outline why node.js is a really good choice, some would say better than Apache for setting up this type of architecture.

[announce]

  • Node.js avoids spinning up threads for each request, or does not need to handle pooling of requests to a set of threads like Apache does. Therefore it has less overhead to handle requests, and excels at responding quickly.
  • Node.js can delegate execution of the request to a separate component, and focus on new requests until the delegated component returns with the processed result. This is asynchronous code, and is made possible by the eventing model. Apache executes requests in serial within a pool, and cannot reuse the thread when one of its modules is simply waiting for a task to complete. Apache will then queue requests until a thread in the pool becomes available again.
  • Node talks JavaScript and is therefore very fast in passing through and manipulating JSON retrieved from external web API sources like MongoDB, reducing time needed per request. Apache modules, like PHP, may need more time, because they cannot efficiently parse and manipulate JSON because they need marshalling to process the data.

Now that we know why node.js is the preferred architecture let’s tackle…

The Dos

Load balancing with Nginx seems to be the popular thing for Socket servers right now. And in our case it’s the preferred web server due to its ability to handle loads of concurrent connections at once while also being easy to scale up easily on any type of hardware.

With multiple processes, you usually need to communicate between them somehow to prevent race conditions or weird state issues. Some sort of memory store or queue is needed, something along the lines of Redis or similar.

Websockets are recommended for most intents and purposes and with Socket.io’s recent switch to Engine.io the performance is much better than pre v1 Socket.io.

Process Management can be handled by PM2, it works really well in combination with sockets, along with memory storing (redis) in order to avoid race conditions and irregular state issues.

To describe some of the Node-related things to keep in mind for successful scaling and throughput of node.js we can outline:

Utilizing multi-core CPUs – set up a cluster, use child processes or use a multi-process orchestrator like Phusion Passenger.

Setting up worker roles connected with a message queue. This will split up your servers in two parts; public facing clerical servers that accept requests from users, and private worker servers handling long running tasks. Both are connected with a message queue. The clerical servers add messages (incoming long-running requests) to the queue. The worker roles listen for incoming messages, handle those, and may return the result into the message queue. If request/response is needed, then the clerical server could asynchronously wait for the response message to arrive in the message queue. Examples of message queues are RabbitMQ and ZeroMQ.

Now for the Don’ts:

Just because Node.js is “asynchronous” doesn’t mean it’s a good idea to have it do lots of things of while the caller is waiting. So don’t simply write data without queuing it.

Don’t forger to consider deletes. Identify delete cases early on and ensure that your framework supports logical deletions out of the box. This can be as simple as the existence of an “IsDeleted” column in the data store / data object. Remember that normal deletions in a DB carry the same cost as inserts. So every piece of transactional data that you plan to delete will effectively be “written twice”.

Most DBs have a way around this, some form of “truncate table”, but it generally only works if you actually plan for it in advance.

Don’t underestimate the importance of High Coupling. As you move from small to big, the part of the code that grows the most is the part that handles failure. Build sub-systems & vertical slices. To make a system “scalable” it needs to be built from small composable pieces that can themselves be scaled.

Don’t think it’s free. Done correctly, a system built with scalability in mind will be less expensive to manage over the long-term and easier to build out. It will need less people, be easier to QA and easier to debug. But it’s not free in the short term. There are known patterns for much of this stuff, but there are lots of pieces and separate providers. You need experienced developers to make decisions about how to implement these best practices. And it’s going to delay your initial product launch.

We hope we put down some useful information for you to consider building your next web app’s backend in node without making the most common mistakes in the field.

Insights from our Consulting Department

January 22, 2019
How To Convert Traffic into Sales with Lead Conversion Strategy
October 15, 2016
How to Create the Smartest Chatbot Ever With IBM Watson Services

Leave a Reply

Your email address will not be published. Required fields are marked *