Over the years I have either participated and/or was the main person in architecting large scalable web based solutions. In many cases I also built them, but in the last few years, I have directed the implementation and execution. There has always been a common thread to all of these build-outs:
”If you are talking about technology, you are missing the point of building a scalable system”
So, what should you be talking about? Three items:
- Real Estate – Room to rack the servers
- Power – Enough power and outlets to power the servers
- Temperature – Right operating conditions to operate the servers
Truly, these should be the focus of the conversations and soon enough, these conversations would be only of any importance to just Cloud service providers. There is , however, an assumption in the above statement: The architecture accounts for lateral scalability. Indeed, this assumption is of a tall order. But not hard to obtained.
The trick to building laterally scalable systems is to architect around the concept of independent components that can be clustered. In other words, each functional block of a system should be a component operating independently from a different functional block (or component.) And given multiple instances of a component in a cluster, it should not matter which instance answers requests at any given time. Moreover, the instances should be load balanced; Furthermore, given multiple geographically distributed clusters, there should be load balancing and fault-tolerance between these clusters as well. In this case, if a server in the cluster goes down, another will take the requests. Similarly, if an entire cluster disappears, a different cluster should take over.
The implication is that components that need to cooperate can find each other regardless of their location. Obviously, the components know about the clusters through DNS, thus, the name space needs to be tightly configured. The other part of being able to mix, match and integrate geographically distributed functionality is to be able to deal with network latencies. This is easily masqueraded – performance drop will be hardly noticeable – by always connecting to the nearest neighboring cluster providing the needed functionality. In automating the switch over algorithm, we see that by thinking of the local cluster as the nearest neighbor, it follows that the entire switch over strategy becomes very dynamic but easy to manage as new clusters are incorporated.
More often that not, data bases (DBs) are needed as part of the web application. I consider DBs as just another component, thus, the DB architecture and implementation must support clusterization and geographical distribution just like any other component in the system. I favor federated database systems, where only partial data sets are housed on each DB cluster, with the caveat that the set must be optimal. I will explain:
If you have multiple geographical instances running on multiple data centers, then you will route traffic based on network closeness to a data center. So, if the application is running on Los Angeles and Colorado based data centers, and I am in NYC, then most likely Colorado is closest to me from a network topology point of view and that is where I should be routed. Thus, and optimal data set in Colorado would be such that most or all of the data for users from NYC would be housed in Colorado. An optimal set does not imply that the data only lives in one place. Using this same example, if somebody in Los Angeles wants to access the same data as somebody in NYC, then the user in Los Angeles should not be routed to Colorado, but a copy of that data should be copied to the DB instance in the Los Angeles data center.
Another tool that I have found useful in building scalable web system is AFS (Andrew File System). If you are not familiar with AFS, a good way to understand it is “NFS on cacheing steroids”. I have used AFS for many parts of the implementation, but in the end, AFS, as an infrastructure component, has helped me with helping, not only synchronize builds across data centers, by releasing to a volume and then “pushing” the volume to all data centers, including configurations; but also, in the management of all the releases.
We all agree that separation of concerns is a good way to architect systems. By extension, it is a good way to architect components. The concerns in this case are the protocol, presentation, API, logic and data layers:
The data layer is the data bindings to the database. We already described above the DB as a separate component; this concern is the connection and abstraction to the DB.
The logic layer implements the business logic, and ultimately the functionality that users experience.
The API layer serves two purposes:
1 – It converts all HTTP requests to elements that the logic layer can easily interpret and on the way back, converts what comes back from the logic layer to XML.
2 – It isolates the layers below by creating APIs for the components they belong to. In this way, the entire components becomes part of a whole and by calling specific action, certain functionality is exposed.
The presentation layer renders the information (not data) coming back from the API layer – already expressed in XML – based on given and specific XSL style sheets. Effectively, this layer is an XSLT that provides, by choosing a discreet XSL, rendering flexibility.
The protocol layer makes the entire system device agnostic. In other words, it should not matter what device the application is being accessed from; whether it is accessed from a web browser or a cell phone or some new device, the protocol layer brokers access and selects the rendering methodology based on the device.
What I exposed above is a way to architect systems that scale horizontally based on three principles:
- Real Estate
These principles reduce the problem of scalability to a simple budgeting exercise. And again, if you are talking about anything else, then you missed the point of scalability.