KubeCon '24 (Paige's Picks)
A Note
Unfortunately I had a change of plans and am missing out on this year's KubeCon I am so very grateful to my team for taking over my O11y Day lightning talk and shifts at the booth ❤️
If there's 1 thing I plan for with conferences it's selecting which talks to attend. For many people the "hallway track" aka the people you meet and conversations you have in the hallways of a conference is the main draw and most enjoyable part. Like I know folks that say they're lucky to make it to 1-2 talks at a conference!! I exist on the other end of the spectrum and am all in on the talks! Nothing gets me more excited than an intriguing talk description with a catchy title. Often I'm torn and wish I could clone myself to be able to attend 2+ talks simultaneously 😆. I just love learning from others especially when it comes to the still rapidly evolving world of cloud native environments. Conf talks are one of my favorite vehicles for learning and sharing knowledge as you'll see from this extensive list.
That being said I had already spent a lot (I mean A LOT) of time combing over the schedule and bookmarking talks that sounded interesting to me. Today I've been pouring over fellow Chronauts' session notes and eagerly awaiting the recordings to drop on the CNCF YouTube channel but figured it'd be such a shame to keep this list all to myself...
SO below you will find talks that caught my eye grouped by theme and then broken out by day of the week with a snippet from the description with a comment from me about why it made the list. Each talk is linked directly to the session in Sched, the KubeCon event app, so you can easily add it to your own schedule in 2 clicks (yw). If you'd prefer reviewing the full list while in Sched here is a link for that:
Even if you're not attending KubeCon IRL, I think it's worth bookmarking any talks of interest and waiting for the recordings! Here's some quick links to the various topics to save ya some scrolling
- K8s
- Observability
- Multi-Cluster / Multi-Cloud
- Open Source Goodness
- Platforms
- Shipping Code
- Infra
- The Network
K8s
Wednesday
Behind Schedule: Pod Resource Configuration from Beginning to... Huh?
I was immediately sold after reading the snippet below....I like my surprises to be in party form not in production 😅
Joe will walk you through some of the surprising behaviors you may encounter with the seemingly basic rules that Kubernetes follows when scheduling and running pods – and how those rules themselves may not be what you think!
Thursday
Engineering a Kubernetes Operator: Lessons Learned from Versions 1 to 5
This is a talk that has been brewing for 7 years! I loooove a talk that covers a longer time horizon like this. If you're interested in writing or maintaining an Operator, this talk may give you some helpful background info and as they describe "hard-learned lessons".
Join me to uncover insights and hard-learned lessons from our journey through the first five versions of a Kubernetes Operator for Postgres. I will trace the development lifecycle from version 1 started in 2017 to version 5 now.
How to Move from Ingress to Gateway API with Minimal Hassle
It's time to start moving or planning your move off of the Ingress API and onto Gateway. Not sure where to start? Feeling a bit stressed about the whole endeavor? This talk is for you!
As of October 2023, the Ingress API has been superseded by the Gateway API, a new set of Kubernetes resources with over 20 implementations that enforces security best practices by design. However, migrating networking APIs is an intimidating task, and doing so safely is every company’s primary concern.
Observability (and monitoring!)
Wednesday
From Observability to Performance
OK, as much as I love nerding out on telemetry and observability stacks...I really dig talks that explicitly connect the dots between the data at your disposal and the user experience.
No matter how fast the Services on your Kubernetes cluster are, users would love them to be faster. But how do you get from a huge pile of metrics across a distributed system to real user experience improvements? There is a way, and with the right tools and the right approach, you can better understand and evaluate Service performance.
Unifying Observability: Correlating Metrics, Traces, and Logs with Exemplars and OpenTelemetry
I mean the quote below is e-v-e-r-y-t-h-i-n-g. I honestly don't care how many o11y tools a company has (within reason) but very much care about how connected the streams of telemetry are. Exemplars don't get a ton of love and I can't wait to hear how Apple is leveraging them!
While metrics, traces, and logs each provide valuable insights, their true power is realized when they are correlated.
Thursday
Lessons Learned Adopting OpenTelemetry At Scale
👏👏👏 This is a can't miss talk for me - I appreciate folks who are willing to share it all when it comes to major migrations, especially what they'd do differently with the benefit of hindsight. Whether or not your org is similar to Heroku, or hasn't started an OTel migration (yet) there's sure to be takeaways for you
This Heroku case study dives into our OpenTelemetry journey where you'll discover strategies on adoption, how to deal with internal resistance, and technical guidance on rolling out the change. Learn from our missteps and what we wished we had done differently. You’ll even see how a bit of luck can help drive adoption over the finish line.
Now You See Me: Tame MTTR with Real-Time Anomaly Detection
🍎 oh Apple! Always doing cool cutting edge stuff wrt monitoring and o11y!
We'll show you how to reduce MTTR and mean time to analyse by proactively identifying abnormal application behavior using statistical & machine learning algorithms on time series data from Prometheus. Learn to pinpoint issues, identify missing instrumentation, and visualize anomalies using Grafana. This session equips you to achieve faster issue resolution and maintain optimal application health.
Fluent Bit: Better Pipelines for Observability
It wasn't until earlier this year when the company I work for Chronosphere acquired Calyptia that Fluent Bit really came onto my radar. I'm sure FB was humming along in the K8s clusters I used to operate, shepherding logs to and fro, but as I have been learning Fluent Bit can do so much more!
Creating better data pipelines is constantly challenging when "better" is defined by performance, low resource usage, and total ecosystem integration. Build scalable data pipelines to manage all your needs for the collection and processing of telemetry data by integrating multiple data sources and formats and reliably sending it to your desired endpoints or vendors for analysis.
Multi-Madness
Multi-Clusters
Wednesday
Cash App's Journey Into a Multi-Cluster Ecosystem
More clusters can mean more problems but it sounds like the Cash App Compute team has tackled many of them! Dreaming of seamlessly transitioning services in and out of clusters with zero-downtime...this talk is for you!
This talk intends to walk you through our experience introducing new Kubernetes clusters for our services at Cash App, migrating and splitting service traffic across clusters with zero downtime, and thinking through tooling adoption / creation to simplify cluster maintenance as our overhead scales.
Does My K8s Application Need CPR? Performance Evaluation of a Multi-Cluster Workload Management App
I have 1 house, the IBM apps described in this talk have MANY. If you're running an application across multiple clusters and wondered if you could do a performance analysis and what tools are available, add this one to your schedule!
Our insights will demonstrate the utility of benchmarking the performance of a multi-cluster Kubernetes workload management application. Additionally, in this talk, we will demonstrate the usefulness of using several opensource tools such as clusterloader2, kube-burner & kwok to evaluate the performance of multi-cluster Kubernetes management applications.
Thursday
One Inventory to Rule Them All: Standardizing Multicluster Management
Sounds like the ClusterProfile API is a huge gift to everyone managing massive amounts of clusters! If this is news to you might be good to stop by this talk 😄
There’s a remarkable lack of standard tools and patterns for multi-cluster. Over time users have found ways to stitch clusters together but the community has been asking for standardization.To share multi-cluster tools, Kubernetes sig-multicluster has introduced the “ClusterProfile” API, a critical building block for multi-cluster capabilities. This API provides a canonical way for multicluster controllers and users to iterate over clusters, and to install or manage multi-cluster features.
Friday
Zero Downtime Upgrades at Scale: How Okta Manages Hundreds of Clusters Daily
idk about you but I am very happy to hear that Okta has well-oiled machinery to keep their K8s clusters up to date! Jérémy and Kahou will peel the curtain back on the internal tooling they built to pull this off. Fingers crossed its got a fun name 🤞
At Okta, we maintain hundreds of clusters, each hosting >130 services, with node counts ranging from 20-400 and we are updating them daily. How do we do it? Without an out-of-the-box solutions we had to build our own and we want to share what we learned with all of you!
Multi-Cloud
Wednesday
Automated Multi-Cloud Large Scale K8s Cluster Lifecycle Management
Databricks has the cloud trifecta with clusters running in AWS, Azure and GCP! I am very curious to hear how they manage kubernetes upgrades not only across tons of clusters but across all those dang clouds!
Blue-green cluster rotations, or cluster swaps (upgrading by creating a new k8s cluster with a new version/configuration & shifting workloads from the old cluster), allow us to implement major infrastructure changes and upgrade k8s versions with low risk through staged rollouts, seamless rollbacks, zero downtime, and minimal operator intervention.
Thursday
Yahoo’s Kubernetes Journey from on-Prem to Multi-Cloud at Scale
Another great "how we did it" migration story this time at Yahoo scale! If you have a cloud adoption initiative in your future, Nandhakumar and Payal are definitely the folks to learn from.
Our team faced numerous challenges during the cloud adoption process, including networking, security, cluster autoscaling, and cost. In this talk, we will share managing K8S in a multi-cloud and discuss the challenges faced and solutions found. Key topics include Shared VPC, IP Space for K8s, securely accessing private clusters, multi-tenant workload identity, and maintaining a user interface to K8S.
Open Source Goodness
Wednesday
Beyond 'Can You Mentor Me?' - Crafting the Contribution Ladder
I've mostly had experience with mentorship in the formal 1:1 capacity but am quite intrigued by the practices mentioned like Role Based Shadowing, and interested to see how I can bring these to my work and introduce them to projects I contribute to
Take inspiration from how the Kubernetes project maintainers make use of various mentorship techniques such as Role Based Shadowing, Peer-to-Peer Learning, and Mentorship Cohorts that can help any project especially CNCF incubating projects stick new contributors to the project
Thursday
The Maintainer Monologues
A fun and informative session on the realities of being a project maintainer. If upping your commitment and contributions to a project is on your 2025 goals list, attend this session to hear about the highs and the lows
Are maintainers born? Or made? Made. They’re definitely made. Oftentimes it’s a combination of trial and error, luck, and lots of hard work. With a mixed group of first time and experienced maintainers, join us for a panel covering the origin stories and learnings of CNCF sandbox/incubating/graduated project maintainers.
Friday
Gamifying Cloud Native: How to Design and Build an Educational Game for Your Project
eeee! I am super into this idea of making it FUN and approachable to introduce someone to a cloud native project! Definitely inspired by this one!
One of the challenges many cloud native projects face is that the abstractions they provide are not intuitive for new users. Since cloud technologies are often built on top of each other and use domain specific language, this problem compounds. Luckily, educational games can be made to help communicate these abstract concepts in a fun and engaging format!
Can You Put a Price Tag on Open Source?
Short answer: yes. But for those who are looking to open source a tool or project at work, get organizational buy-in to spend work hours contributing back to open source, this is your session!
Earlier this year, the Harvard Business School released the paper titled “The Value of Open Source Software,” estimating the worldwide value of OSS at 8.8 trillion, and on average, it would cost companies at least 3.5x more to develop similar projects internally. Yet, many organizations and engineers struggle to understand or realize this kind of value from contributing to these projects.
Accessibility at KubeCon: Deaf Voices in Cloud Native
Ever wondered how you can make the CNCF more inclusive? There are lots of options, one of which is attending this panel, hearing from deaf community members share their experiences and the impact KubeCon's recent accessibility efforts has resulted in! Then take your learnings and advocate for those same changes at other conferences you attend or your own organization's events.
Never met a deaf person at a conference? That is not surprising. While there are lots of deaf engineers, until recently, most conferences — and virtually any other community activity — haven't been accessible to deaf community members. But for KubeCon, that all changed exactly a year ago! During this discussion, deaf panelists from various countries will shed light on their unique experiences being deaf in tech and the impact that making KubeCon accessible has had on their lives and hopes for the future.
Tutorial: Simplify and Optimize Your YAML with YAMLScript
not going to lie I definitely LOL'd at the first sentence and then ROFL'd when I saw that the speaker, Ingy was literally one of the original inventors of YAML itself 🤣🤣🤣 this session hopefully will be recorded because I've got to try out this YAMLScript thing!!!
Nobody likes YAML (or anything for that matter) when its a giant and repetitive mess. Of course, there are already existing technologies like Helm and Kustomize that help provide make YAML nicer for Kubernetes. The new kid on the block is YAMLScript. Being a complete programming language (built over a vast and mature ecosystem) its capabilities are effectively limitless.
Platforms
Thursday
Evolving Reddit’s Infrastructure via Principled Platform Abstractions
As someone who's spent their entire career at startups I really feel the "adapted to solve tactical, near-term problems" intro line. This talk tells the important "before" story when it comes to a platform - why/when is it time to invest in one along with learnings and stories from the past 3 years of their ongoing journey.
Reddit’s approach to infrastructure management has grown organically over time, adapted to solve tactical, near term problems. We have now reached a point where the only way to scale infrastructure capabilities to a growing engineering organization is through platform abstractions offering self-service management of standardized infrastructure patterns.
Friday
Platform Engineering in Financial Institutions: The Practitioner Panel
I don't work in FinTech and probably never ever ever will. But I am grateful to all you who do! If I did and was interested in spinning up platform engineering I would totally attend this session 🏦
This panel brings together seasoned practitioners from leading banks and financial institutions to share their firsthand platform experiences, successes, and challenges. Join us as we discuss the journey of adopting and deploying CNCF technologies at scale within the highly regulated financial sector. We’ll explore practical examples of both successes and incidents where things have gone wrong, providing the audience with valuable takeaways.
Platform Engineering for Software Developers and Architects
Here we are 2 years after "From Kubernetes to PaaS to...err, what's next" and I am a sucker for a follow up talk! Since Platform Engineering has become the new hotness (despite not actually being a new concept) I still feel like its mostly Operations-minded folks talking about Platforms and SRE/Ops/Infra/DevOps type folks would do well to hear things from the software dev/architect perspective!
I'll introduce the topic of platform engineering through the lens of a software developer and architect. My primary goal is for developers to understand "what good looks like" with a successful platform build and help them understand how a platform can influence the SDLC (for better or worse!)
Shipping Code
Wednesday
Perform Laser Focused Deployments by Deciding in Advance the Blast Radius
I used to daydream about progressive delivery. Well not really. Mostly it was (and is) visions of alpaca dancing in my head BUT I have always been impressed with sophisticated deployment strategies! If you've already got Argo Rollouts set up this talk can help you take things to the next level
We see a lot of teams that choose an arbitrary number of clients that access the new version of a canary. Yes, it is very easy to send only 10% of the traffic to the new version of a Kubernetes deployment. But sometimes you want to choose WHICH 10% sees the new traffic. In this talk we will see several approaches on pinning down specific clients to the old or new version and advanced scenarios for sending canary traffic only to a specific subset of users such as internal employees or customers who have expressed their interest on seeing brand new releases as soon as possible.
Secure by Design CI/CD: Practical Insights from Adobe and Autodesk
Last year at Open Source Summit I kept hearing about SLSA (pronounced "salsa") and SBOMs and then at Rejekts had a talk feature this terrifying quote from Dan Lorec "Every time you pip install
, go get
, or mvn fetch
something, you’re doing the equivalent of plugging a thumb drive you found on the sidewalk into your production server." So even though I'm not a security whiz these topics certainly have my attention! Security, like reliability, is everyone's job, so do your part and join this session 🔒
Worried that your CI/CD pipelines and developer workflows are insecure? Lost in security buzzwords like SBOMs, provenance, attestation, SLSA, OpenSSF, and more? Seeking a clear, actionable reference architecture to secure your pipeline? Whether you are just getting started on your Software Supply Chain Security journey, or are ready to take it to the next level navigating this diverse ecosystem is challenging.
Thursday
Bring the Joy Back to Deployments!
A demo speedrun! Be still my heart! Gotta love a whirlwind tour of popular OSS deployment projects and I have no doubt this will be a fun and engaging session
Our journey will begin with manual hello world deployments and from there we will explore some of the most common modern tools for CI/CD, including a demo speedrun! Major destinations on this tour will include helm, kustomize, skaffold, ArgoCD, Tekton, Jenkins and JenkinsX. We will walk through the fundamentals of CI/CD, explore tradeoffs and discuss the process for implementing these tools in your software development lifecycle.
How We Made OpenTelemetry Be Our Fitness Tracker for Your CI/CD Pipelines!
This talk combines some of my faves - OpenTelemetry, gaining observability for CI/CD pipelines AND a well thought out metaphor. 🏃♂️🏃🏃♀️
Healthy pipelines ensure rapid and continuous deployments every time code gets committed to the Git repositories! Every new repository and commit puts more load on the CI/CD tool making it more challenging to keep this crucial heartbeat healthy! In this session, engineers from Clario will demonstrate how they leverage OpenTelemetry to observe, validate, report and optimize their CI/CD pipelines, keeping their deployments healthy despite increased scale and unlocking the full potential of modern software delivery on Kubernetes with GitLab.
Friday
Achieving and Maintaining a Healthy CI with Zero Test Flakes
I can only imagine the amount of tests and checks that run in CI for Kubernetes and its projects and how toilsome flaky tests can be at that scale! If you attend this talk you don't have to imagine any of this you'll hear directly from Google engineers tactics, strategies and tooling they leverage.
In the fast-paced world of software development, a reliable and efficient Continuous Integration pipeline is essential. However, flaky tests can cause delays, frustration, and decreased confidence in the codebase. This session will go deep into the strategies, best practices, and tools that the Kubernetes projects use to eliminate flaky tests and achieve a robust CI pipeline that delivers high-quality software consistently.
Infra
Wednesday
Better Pod Availability: A Survey of the Many Ways to Manage Workload Disruptions
Did the first sentence of the description strike fear into your heart? Perhaps puzzlement? Better add this session to your list and get an overview of all your options for configuring pod availability
Kubernetes Pods are ephemeral, but some are more ephemeral than others. Kubernetes provides a dizzying array of options to manage and handle Pod disruption. From PodDisruptionBudgets, to "safe-to-evict" annotations, GracefulTermination timeouts and more, it can be incredibly hard to determine the optimal solution for handling Pod disruption and how to manage gracefully terminating your application.
Friday
The Node Tetris Rabbit Hole: Why Your Binpacking Might Be Underperforming
In this era of "do way more with less" and all eyes on budget line items, having some tried and true tips for identifying and understanding cluster under-utilization from someone that's been there and done that is a gift 🎁
Have you ever looked at your Kubernetes cluster and thought “I have a perfectly good autoscaler! Why are all my nodes at less than 50% capacity?” When a team moves to the scale of hundreds of clusters with thousands of nodes, efficient binpacking changes from a side task to a financial necessity. From inefficient client apps to long-buried cluster configs, follow the Adobe Ethos team as they track down leads on what’s causing cluster underutilization and how to fix it. You will also learn some tips for designing your clusters to avoid these issues in the first place.
Love thy (Noisy) Neighbor: Strategies for Mitigating Performance Interference in Cloud-Native Systems
A couple months back I enjoyed reading about Netflix's noisy neighbor detection with eBPF and if you did too, this talk seems like an excellent followup covering the next step after detection - mitigation from the megacorps.
In cloud-native environments, application performance often degrades due to contention over shared resources such as CPU caches and memory bandwidth. Current container technologies lack mechanisms to isolate these resources, which compels operators to maintain low utilization by scaling out their deployments. This session explores strategies used by hyperscalers like Google, Microsoft, Facebook, and Alibaba to mitigate such performance interference
Improving Service Availability: Scaling Ahead with Machine Learning for HPA Optimization
Machine Learning is the side of AIOps that I can actually get into! In this case ML is used to proactively scale pods via HPA with generalizable takeaways for you to bring back to your org. Love it!
In this talk, we will explore employing machine learning (ML) algorithms to enhance the Kubernetes autoscaling capabilities beyond the traditional, reactive horizontal pod autoscaler (HPA). Attendees will be introduced to how to leverage recommendation algorithms to predict future load and usage patterns, allowing for smarter, proactive scaling decisions. This approach not only ensures high availability and responsiveness of applications but also offers a pathway to substantial cost optimizations by preventing over-provisioning and minimizing resource wastage.
The Network
Wednesday
Understanding How OpenTelemetry Network Uses eBPF for Network Observability
This talk seems like a great overview of both OTel and eBPF grounded in the practical application of gathering network data. If you've been interested and/or confused by either technology this talk just might lead to your aha! moment.
We'll explore the architecture of the OTel Network, focusing on its key components: the kernel collector, kubernetes collector, cloud collector, and reducer which together enable collecting, ingesting, aggregating, enriching, and exporting telemetry data collected from various sources. We'll show an end-to-end setup to demonstrate the use of these agents and reducer component to send data to the OTel collector.
Life of a Packet: Ambient Edition
My all time fave approach to educational talks is "life of a _____" so obviously this session had to make the list. and side note Ambient edition is NOT the title of a new lofi cloud native developer music mix but instead an interesting new configuration for Istio which you can learn all about by attending this session
Istio's new "ambient mode" promises to (and delivers!) dramatically simplify and reduce the cost of running a service mesh. This doesn't come easily, however; Istio employs some advanced and innovative techniques to deliver on this promise. In this talk, Keith and John - two leads on the ambient project - will give an in-depth look under the hood to show how ambient mode operates, walking through how a packet gets from point A to point B securely and efficiently.
Thursday
Understanding Kubernetes Networking in 30 Minutes
This is a tall order but I have faith Ricardo and James can make good on this talk title!
You are learning Kubernetes and started to face concepts like Pod CIDRs, Services, CNI, kube-proxy? Welcome! you have reached the amazing area of Kubernetes networking! We all have already been there and know how complex it may seem on the beginning, but in this talk, Ricardo and James will demystify the Kubernetes network concepts and model on a fun way, exploring how it is designed, why the is a "pause" container on every Pods, how the communication between Pods work, what are kube-proxy and CNI and their importance.
How the Tables Have Turned: Kubernetes Says Goodbye to Iptables
In a recent episode of Off-Call - "It's The Network", my guest Leon Adato mentioned that under the hood somewhere in Kubernetes there is iptables. And that was true...until v1.31! Hello nftables (no relation to NFTs I'm assuming 😆) Get the full scoop at this session
For decades, iptables has been the preferred packet filtering system in the Linux kernel. Used extensively across the Kubernetes networking ecosystem, iptables is now on the way out and is expected to be removed from the next generation of Linux distributions. With iptables past its prime, where does that leave Kubernetes? The successor to iptables – nftables – is ready to carry the torch instead, with a newly released beta kube-proxy implementation in v1.31 and network policy using Calico’s nftables backend.
Friday
Thousands of Gamers, One Kubernetes Network
I mean based on the title alone how could you NOT be interested in this talk??
Uninterrupted gameplay with minimal network latency, jitter, and maximum throughput is crucial for a great gamer experience. But how do we maintain consistent network quality in cloud gaming production environments at NVIDIA when 2K+ players (pods) share the same physical network for game storage and streaming? When a new player joins and a pod starts downloading large contextual game data, it is vital to shield other players on the same node from this 'noisy neighbor'. Kubernetes provides limited pod-level traffic shaping but we needed more than that. In this talk we will show how we achieved true Quality of Service and wire-speed networking on Kubernetes clusters using Differentiated Services Code Point (RFC7657) markings on pod traffic.
Shopify’s Open Source Approach to Network Monitoring with eBPF, Vector and ClickHouse
Very cool to hear that Shopify's got an OSS stack running for network monitoring and I am keen to hear how it came to be! Whatever they're doing is working though because the only time over many years and MANY MANY purchases I've had an issue checking out was with OG Slimes and that's more due to the overwhelming demand vs supply during the Friday drops.
At Shopify, we’ve successfully implemented a scalable, open-source network monitoring solution for the cloud. In this talk, we will demonstrate how we built a network monitoring solution leveraging eBPF, Vector, ClickHouse, and Grafana. This solution enables us to monitor over 30 million network flow, DNS and other networking-related events per second at the container level for thousands of services across hundreds of Kubernetes clusters in the Shopify Cloud. We will also share the lessons we learned regarding these technologies and provide insights on how you can implement your own purely open-source monitoring solution capable of handling millions of events per second.
Welp that's the list! KubeCon speakers work really hard to put together awesome and valuable talks and while you totally should check out the "Hallway Track" my hope is that you found a handful of talks to attend in person or saved to watch when the recordings drop.
CAT TAX
Member discussion