Nine Minds Logo
Back to the server room

75 Days

By Robert IsaacsApril 22, 20265 min read

Our release was going to take 75 days to fully transfer to our users. The CTO refused to believe the number. I stood at the window thinking about how nice it must be to not know how doomed everything was.

Share

I promised a follow-up to the Vero Beach story — a second time that scaling had gotten away from us at ConnectWise as we were transitioning from a small SaaS company to a larger one. We had grown from dozens of users to many hundreds of companies.

With numbers like these, and after the unfortunate city of Vero Beach incident (see previous episodes), we moved everything to a local datacenter.

"This oughta last us a while."

After all, we had a gigabit of internet bandwidth. We had 5 virtual machines.

A Three-Month Release Cycle

It wasn't just infrastructure bottlenecks causing us pain. Around this time, we started to struggle with the difficulties of a larger development team — quality issues, slow testing cycles, not enough automation — and it took a good three months to release the next update. All the while, the company was growing exponentially.

Sure it had been three or four months since our last release. Sure that one had taken a couple of days to roll out. But it hadn't been that long, right?

Right?

So the release was finally ready to roll out, and we hit the button.

The entire product went offline. Everything was out.

Our network engineer — a grizzled veteran by this time — scaled back our bandwidth to bring the service back online.

When the dust settled, I did the math.

75 Days

Our release was going to take 75 days to fully transfer to our users.

75 days.

The number was so absurd, our CTO seemed to just refuse to believe it. I remember looking out the window at everyone working and thinking about how nice it must be to not know how doomed everything was. I was melodramatic that way.

I had become obsessed with cloud computing around this time. Google had released App Engine, and Amazon had just improbably released some computing and storage services called AWS. Could S3 help us here?

Hour Three

On hour three of our 75-day deployment, I whipped together a small proof of concept. I needed to know: if we changed over, would our clients crash permanently — and would we be well and truly stuck?

I set up a server, configured it, set up the experiment, and crossed my fingers.

It looked like it might work.

I had some impromptu meetings to discuss a cutover. It wasn't as if we had much of a choice. I moved every update over to S3 and flipped the switch.

We logged into one of our customers to see if it would make any difference. The estimated download sat there, immobile, showing a months-long estimated time. I watched as we made the cutover and saw the download completely freeze.

I held my breath.

It reset back to the start.

Oh no.

And then — within seconds, it downloaded the entire update.

I looked at the logs for S3. In the remainder of the first hour, we had transferred 400 gigabytes of updates. In the next hour, we transferred the remaining 600+ gigabytes.

In less than an hour, we had moved what would have otherwise been 70+ days — and a collapse of our hosted business model.

That was the moment I knew the cloud was our future.