My "Artisanal" Ingress

2022-04-17

Summary

I built a replacement for nginx and cert-manager in my Kubernetes cluster. It leverages NATS and CockroachDB, and is written in .NET Core C#.

It’s simple, easy to setup, and easy to understand. It features a web-interface for management and configuration. It’s also horizontally scalable out of the box and aims to follow all the best practices for high availability and observability.

Finally, it’s pre-alpha at the moment and I’m not ready to open-source the project. However, I’m looking for like-minded people who might be interested in turning this into something more broadly useful.

Background

For the past year or so, I’ve been working on replacing cert-manager in my Kubernetes cluster. It started with a cert-manager outage due to a DNS bug, and by learning how to manually manage certificates. I then automated all of that with KCert.

When I got KCert done, I realized there was another part of my setup that could be improved: the NGINX Ingress controller. So I decided to continue the effort and replace that as well.

And that is how I created My “Artisanal” Ingress. I call it artisanal because I built it to my own personal taste. So far I’m pleased with the result. I’m using it in my personal Kubernetes cluster, which is serving the page you are reading right now.

Design Decisions

When I decided to build a replacement for NGINX Ingress Controller, I came up with several goals:

Simple Setup

I’m not a fan of Helm or CRDs (more accurately: I love the idea of CRDs, but I think they’re often used unnecessarily). They are frequently used to create overly complex systems, and make it extremely difficult to debug when things goes wrong. Sometimes your only option is to delete the whole cluster and start again.

For example, take a look at the cert-manager and NGINX Ingress Controller installation:

The installation of cert-manager requires 7000+ lines of yaml
cert-manager runs three pods in the cluster (Is cert management really that complex?)
NGINX Ingress controller can be installed via Helm or almost 700 lines of yaml

Is all this necessary? What if something goes wrong? Will you know how to debug and fix any of this?

No State in Kubernetes

New versions of Kubernetes are released quite frequently. The easiest way for me to stay up to date is to create a brand-new cluster, move all my services there, and destroy the old one.

This requires copying everything over from the old cluster to the new one. While my deployment and service definitions are checked into a git repository, secrets and certificates are not. Those objects need to be manually copied over.

For this reason I decided to eliminate the need to copy certificates and ingress configurations. I decided to store the state of my ingress controller in a central CockroachDB store. This really could have been anything: S3, Azure Key Vault, etc. But the main idea is to store all of this information outside of the Kubernetes cluster.

This approach has two advantages:

I don’t have to copy Kubernetes certificates from one cluster to another
I can deploy multiple Kubernetes clusters that rely on the same source of truth

Must be Easy to Scale Horizontally

My first ingress controller in Kubernetes was Traefik, and I was happy with it for a long time. Then I discovered that it doesn’t support multiple instances (what they called high-availability). Certificates were stored on the local disk and couldn’t be shared across multiple instances of the service. The paid version of Traefik did not have this limitation, and that did not sit well with me. I even tried to fix that myself, and eventually gave up and moved to NGINX.

For this reason, I set out from the start to design my ingress controller to scale seamlessly. Using CockroachDB as my data store is the first part of solving this, but there is also the problem of keeping all nodes synchronized when things change. I decided to leverage NATS for this purpose. Using NATS made it easy for all instances of the service to stay synchronized and exchange messages.

Built in Certificate Management

I thought about using KCert together with this new system, but decided against it. It feels like they should be two separate systems, but especially with ACME HTTP challenges, it becomes difficult to cleanly separate the two.

The other issue is that KCert is specifically geared towards working with Kubernetes ingresses and certificates. However, I had decided that I want to store that information outside of Kubernetes for this project. I therefore couldn’t use KCert without decoupling it completely from Kubernetes and making it much more complex.

I therefore decided to build certificate management directly into my new controller. This wasn’t too much work, since I could reuse the code I wrote in KCert.

Good Observability Practices

I wanted to make sure that the system is easy to debug, monitor, and maintain. For the monitoring piece, I tried Azure’s Application Insights, Datadog, and Honeycomb. All of these options are great, but:

I’m sure there are other great options out there.
Pulling in all those client libraries doesn’t feel right.

I’m therefore leaning towards a more generic approach: I will use Open Telemetry, which is the standard the industry is converging to. Most monitoring systems support Open Telemetry, either natively or through side-car shims.

Other Considerations

If I were optimizing for broad community adoption, I would have written this in Go or Rust. However, I really enjoy writing in C# and I can practice Go and Rust at work. For this reason I decided to go with .NET Core C# and used YARP.

Additionally, I’ve been looking for an excuse to learn and use both CockroachDB and NATS. I use CockrochDB’s cloud service as my data store and NATS to keep my load balancer instances synchronized.

Where’s the Code?

The project is currently private and I’m probably not going to open-source it any time soon. I expect that I will open-source it at some point, but for now I want to have the freedom to make drastic design changes. Open-sourcing it would leave me worried about affecting anyone using the code.

I would however love to collaborate on this idea if there is interest. If you are interested in seeing the code and helping turn it into a usable open-source project, please reach out! The easiest ways to contact me are LinkedIn and Twitter.