Cloud computing has well surpassed the buzzword state; it has gradually become the most convenient destination for companies to deploy their IT infrastructure and services, since it offers obvious benefits such as:
- No upfront investment / costs
- Predictable pricing models
- No infrastructure maintenance, from setting it up all the way to maintaining and eventually replacing it
- Simpler deployment process.
Due to these advantages, not only start-ups deploy their systems in the cloud, but even mature companies are considering this option and have started to take steps towards it.
Before delving into the details of how to "move" systems to the cloud, let's have a look at the different layers of abstraction of the cloud services. As one can see in the diagram below, there are 3 main building blocks in the cloud stack.
- Infrastructure as a Service (IaaS) is the core component of any cloud platform, be it Amazon Web Services, Microsoft Azure or Google's Cloud Platform. This is the lowest level of abstraction which offers various types of computing nodes / VM's for a fixed runtime cost. This is the closest one can get to the bare-metal of the cloud since between the VM's and the hardware is only a thin hypervisor / VM management system that handles drivers, networking and the hardware resources. Once these nodes are up and running, it is up to the company how it is managing and using them, be it for high performance computing, data storage or web hosting. To the company, they are just "some machines running somewhere in a geographical region/datacentre".
- Platform as a Service (PaaS) is the following layer of abstraction and it is built on top of the cloud infrastructure. From the businesses' perspective, these are nothing else than services plus developer tools focused on delivering a specific computation task, on steroids. They can be specialized in offering various data storage mechanisms, hosting web applications, data aggregation and visualization (for delivering app diagnostics information), data processing mechanisms, event management mechanisms and so on. After deploying specialized systems to these platforms, all the "chores" are abstracted away behind a set of user-friendly options: how and when the platform should scale up or out for you to accommodate different dynamic requirements, how it should back up data or what to integrate with, being a few of the examples.
- Software as a Service (SaaS), as the top of the cloud stack, is the layer where businesses or end-users either build their own commercial multi-tenant software solutions or find applications suited for their specific needs, from photo storage (such as Microsoft OneDrive https://onedrive.com) all the way to intelligent business solutions (such as Microsoft Dynamics 365 https://www.microsoft.com/en-us/dynamics365/) or integrated communication platforms like Microsoft Teams https://products.office.com/en-us/microsoft-teams/.
After a brief look on all the three layers, we can easily see that from the infrastructure point of view, IaaS is the closest to self-hosted infrastructure, the only significant differences being the hardware owners and the cost distribution. Having that said, while the cloud providers had lower maturity and stability as well as a very thin PaaS offering around 7-10 years ago, the most natural choice for software providing companies was to start exploring the IaaS layer of the cloud.
Traveling a few years into the future towards the present moment, we are talking about a much more mature and advanced cloud, which provides a rich set of features and tools that can be integrated into software systems. Cloud providers such as Microsoft Azure provide a "fat" PaaS offering, with more competitive cost plans, which can be tailored and scaled for specific needs, thus being very attractive for companies willing to streamline their software deployment/release cycles while optimizing their infrastructure costs.
Consequently, many companies have migrated their software services or are considering to do so, from self-hosted or cloud-hosted infrastructure/IaaS to PaaS. In theory, this should not be a big problem, especially if the systems have a decoupled architecture and a modular structure. However, practice demonstrates that there can be many factors which can be easily overlooked while planning such an endeavour. In the following few paragraphs, we'll look at these factors as well as how to deal with the problems which may arise while carrying out this effort.
First and foremost, before migrating your services to the cloud, you need a clear migration plan. But the plan itself has no use until you answer to the following questions:
- What kind of services do you own? Are they compute-intensive? Are they data-intensive?
- How many requests do they need to serve? What is their usual throughput vs performance?
- Based on the previous question, what kind of platforms do you need, how big should they be, and how many?
- Do you need replication for them?
- Do you need relational storage, or non-relational storage is enough?
- Do you have background worker processes?
- What is the current network topology of your services? Etc.
Once you have these answers, your plan can adhere to the following steps:
- Identify your components and establish which platform features can be used from the cloud to deploy the components to. Once you've chosen your cloud provider, set some time aside to explore the cloud offerings, do some POC's and ensure you find the right feature/platform for each of your services. Not doing this upfront might result in a lot of frustration and eventually work that you throw away later in the game.
- Deploy your infrastructure. Having an automated infrastructure deployment process is of great help, especially when you have multiple services using the same type of platform. Ensure you cover as many of the configuration aspects as possible in your infrastructure deployment templates; otherwise you'll need to redeploy your infrastructure every time you find you've missed something. The important aspects include: scalability parameters, alerts, integration configuration (port bindings & SSL Certificates), networking, domain name mappings and so on.
- If you don't already have solid logging and telemetry in your code, now's the time to add it! Once you will have deployed your code to the platform, you won't know on which machine it's running on, nor will you have the luxury to always debug it. It's true that cloud providers offer tools such as remote debugging, diagnostics as a service or remote file management tools, but they're not always recommended, especially when we're talking about services running in production. In this case, if an incident occurs, it's highly likely that nobody will be prepared for it (that's why it's called an incident), so you will need to get all the telemetry related to it to study its root cause, mitigate it and come up with a contingency plan. So once again, measure and log as much as you can... but relevant information!
- Prepare to "fork" your code. Now that you know the details of the platforms that you'll use and you're familiar with your architecture, it's highly likely that your code is not going to look the same. You might need to plan for some code or configuration changes. If you find that some code changes are required (an example which crosses my mind is converting your Windows Services into cloud background workers), you'll need to get your hands dirty, but don't get overexcited. You'd rather also keep the old code besides the new component as a fall-back. It's also needed in the next step.
- Analyse and change your current deployment process. How many environments do you have? Where do you keep your configuration values? Change your deployment process so that until your apps are fully tested on your new infrastructure, the old infrastructure is live (and can be redeployed). So, no matter whether you're using VSO, Octopus, TeamCity or any other tool for your deployment process, make sure your old, stable infrastructure covers your back in the case of an unpredicted event. Thus, change your deployment process so that it deploys your bits to both your old and new environments.
- Run your test suites. Once you have your environments up and running, run your sets of tests. In an ideal organization, you would just need to run all your automated tests to validate your new deployments. But we all know that some scenarios are simply put very expensive to automate and maintain, so make sure you also run a suite of manual smoke tests. Take some time to get into your customer's shoes. At this point in time, even if your tests fail, you should be able to determine from your telemetry what went wrong and fix the problems, redeploy and try again.
- Flip the switch for the production environment. By now you should be confident on the new deployments of your services. You should plan this step to just flip a switch so that the old infrastructure is swapped with the old one with minimal downtime. This allows you to quickly switch back to your former infrastructure if something unplanned occurs. For example, this step should be as simple as changing CNAME mappings in your domain registrar configuration. If you can't reach this performance, make sure that the customers from the geographical regions where you swap the services are affected as little as possible. For instance, if you want to flip the switch for the services affecting the US customers, make sure you do this during night, Pacific Time.
Now you have a plan! You should also be in a pretty good position to estimate your work and integrate it into your teams' backlog. However, make sure you have a look at the potential risks before that. Trust me, if something is bound to happen, it will happen, and it will also affect your plans! So, here's a list of things which could make you wonder how accurate your estimates are:
- Estimations can be biased. You've heard from a partner / customer / consultant that this effort is a piece of cake and it should be done in no time, although they highly likely don't know what's under your services' hoods. If you let yourself influenced by them, also ask them to join you while you spike, plan and estimate this effort. No guessing before looking at all the sides of the story!
- The cloud platform has its quirks. At some point, if your services need to be configured in a more out-of-the-ordinary way, expect that the cloud platform management tools do not support this configuration or the tools for automating it are buggy. So, expect that you'll need to spend a bit of time creating support tickets / explaining to the cloud platform's support engineers the problems that you are facing to solve the problem and fulfil your needs.
- External dependencies can become a substantial bottleneck. If your teams rely on other teams/organization to provide any work throughout this plan due to limited permissions, limited access or simply limited knowledge, expect delays. Any such "integration" between teams will cause some delays and if multiple teams request such services from the same team which might also have their backlog besides providing support / services, then you have the perfect recipe for substantial delays and frustration. Instead, consider empowering the teams that will migrate the services so that they can perform this effort end-to-end.
- A completely new environment with a set of completely new constraints might behave unexpected. Prepare to understand the true "wrath" of the cloud. Especially when you have complex services, you'll stumble upon the limitations of the cloud. Then you'll need to understand them and identify alternative ways your service can fulfil its requirements; thus, you might need to change code / configuration / integration points. So be prepared for it, as it could happen.
- Testing might take a long time and might reveal some defects which are not so trivial to fix. The same as above, be prepared to confront this. The problems that you find might be related to infrastructure, platform, configuration, and so on.
- You simply want to leave things in a better state. Since by doing such a migration you/ your teams need to understand all your integration points, so you'll have the big picture in your head, it is a very convenient moment to tackle some problems / technical debt which haven't been approached so far. I like to call them low-hanging fruits. I am talking about telemetry improvements, more secure communication channels like HTTPS if you only have HTTP enabled, configuration alignment, restructuring the deployment pipeline, either for the infrastructure or the binaries, and so on. Usually setting aside just a bit of time for improving the state of your code is a good idea, be it that we're talking about a simple feature implementation task or service migration effort.
The list can continue, but I'll leave it open and I'll ask you what you're thinking about besides the items enumerated below (I'd be happy to hear your ideas).
So now that you've seen how your plan can be affected, I encourage you to challenge your initial estimates. Are they still accurate? Or you'd rather want to re-estimate the effort? No matter what the answer to this question is, keep in mind that such an effort is not quite a piece of cake, since it requires the knowledge of your entire ecosystem and its integration points.
Now that you've been through this planning process, you are in a good state to start the actual work. I would add that if you've done your homework well enough and you have a robust plan, the most important part of the work is already done since you have a clear direction.
To sum it all up, migrating your services to PaaS is truly an investment in the future. Although it seems like an expensive effort at first, if you're prepared and know what to expect, you'll be in a very good position to succeed in a timely and elegant fashion, which will bring you benefits on the long run, such as much lower running costs and little to zero maintenance effort. But keep in mind that migrating your services to PaaS in the cloud requires you to set some initial time aside to come up with a robust plan , while identifying and accounting for the potential risks that may arise.
Feature Image: http://s3.amazonaws.com/files.technologyreview.com/p/pub/legacy/fbserver040_x900.jpg