How to "Production"? (aka let's talk about DevOps)

I started a reply to another post, which grew into something else that seemed (almost) stand-alone. What do you think? Also, I might be procrastinating by writing this. At least I feel productive! :wink:

Most of the conversations about Redwood in a production context seem to focus on two topics:

  1. hosting provider
  2. CI (aka tests)

While it’s true these two things make up the whole of the CI/CD acronym, in my previous experience (aka startup lives), both my servers and tests were but a small slice of the answer to, “How do you do Production Redwood?” For me, the thing we are really talking about here is CI/CD plus DevOps. Although I’ve (ineffectively) attempted to stoke DevOps discussion before, I don’t recall anyone asking about DevOps specifically. So let’s talk about it!

I’ll go first.

What are you optimizing for?

For me, this is the question. My answer is always about people (which can often be unintentionally superceded by technology).

  • I want learning to increase over time
  • I want trust to increase over time
  • I want these for both my team and my end-users

I’ve focused on driving these outcomes by shipping more code more frequently to production, implementing robust systems to catch errors and trigger alerts, assess and respond (note: includes documentation and communication), and then iterating accordingly.

The nuance here is that my suggested priorities (below) are intentially inclined toward encouraging mistakes.

Priorities for Production’ing

Here’s my order of DevOps priorities for previous applications I’ve built and maintained:

1. redundant production infrastructure

Errors are bound to happen and take down a service. So create an infrastructure where it doesn’t matter because you have service redundancy + restarts.

Servers themselves go down (or a hosting region), make it so you only have to flip a switch to your backup in another region.

Tools I’ve used:

  • cloud66
  • kubernetes + docker
  • EC2
  • AWS Load Balancers
  • Cloudflare

2. Optimize to catch errors in production and diagnose

Assuming you services can fail without taking down everything, the next step is to be excellent at catching errors when they happen. As devs, I’ve found we spend a majority of our time attempting to mitigate potential issues at the front-end of the process. I learned that by setting up systems to catch errors in production, safely, I could easily throttle back to being GoodEnough with CI. This took a weight off shipping and freed up time as well.

Don’t forget about server errors related to capacity! Monitor those, too.

Tools I’ve Used:

  • papertrail + logspout
  • pingdom (note: also used for end-user performance monitoring)
  • fullstory
  • segment + a myriad of analytics
  • AWS logs → Cloud66 monitoring → alerts
  • slack

Goal → catch errors on both server and client and then trigger an alert based on type of error

I tried APM like Datadog and New Relic but found them to be overkill and expensive for my needs.

3. How quickly can you go from “new code” to “deploy”? How frequently do you do so?

This is the “CD” part of the equation. More important than being able to upgrade or patch is the capability to deploy the patch. It’s easy to take this for granted with modern hosting providers that follow Jamstack-style deployments.

Note: for me, having a Staging deployment that was 1-1 with my Production deployment was a critical piece of this.

Tools I’ve Used:

  • Git branch deployment strategy
  • Hosting provider + triggers based on git push to branch
  • Zero-downtime deployment (for me this was Kubernetes strategy; lifecycle managed by Cloud66)

4. What’s the “quality” of the code you’re committing?

This is the “CI” part of the equation. I don’t want to understate the value of good CI, which increases exponentially as complexity grows.

But I do have two opinions:

  1. CI tools have come a long way; you can get lots of bang for little buck these days — do it!
  2. The threshold for “good enough” is not 100% coverage. That’s a distraction due to the fact it’s easy to measure and feels critical. Cut back to 80% and use the time you save to ship more features!

Tools I’ve Used:

  • CircleCi and GH Actions
  • GH built-in code quality tools and code monitoring
  • <insert test library here both unit and functional>

Curious to read your reactions, suggestions, pushback, and improvements. In the near future, I envision us being able to have a solid RedwoodWay answer to the question:

"How do you Production with Redwood?"

Thanks for the write up. Feel like redwood could offer a magical command for request/response logging, tracing, error reporting, like key parts of #2. Have had to do a lot of custom code for those.

1 Like