LWKD: Week Ending October 22, 2023
Last Week in Kubernetes Development
Week ending October 22, 2023
Developer News
SIG-Docs called for Issue Wrangler nominations. Please reach out to one of the leads on #SIG-Docs slack channel if you'd like to volunteer or have any questions around the role.
You have until November 2 to register for the Kubernetes Contributor Summit in Chicago. If you need an exception to attend, you should ask even sooner.
Mike Danese is stepping down from SIG-Auth leadership and has nominated Mo Khan to replace him.
Release Schedule
Next Deadline: Feature Blog freeze, October 25th
Monday was the deadline for Exception Requests; hope you didn’t miss it. You also need your blurbs for the Feature Blog prepared this week, and next week begins Code Freeze.
Patch releases 1.28.3, 1.27.7, 1.26.10 and 1.25.15 came out last week. This includes opt-in mitigation for the HTTPS2 DOS bug as well as golang updates.
Featured PRs
#119026: Introducing Sleep Action for PreStop Hook
Networks are, sadly, not instantaneous. And even if they were, light-speed CPUs are also unfortunately unavailable. This has lead to a very common case in Kubernetes where Pods being shut down take some time for that termination to be reflected in places like Service Endpoints, or the proxies using them for Services or Ingresses. In a healthy cluster this delay is short, usually only a few tens of milliseconds, but if the web server software in the Pod stops accepting new connections immediately on receiving SIGTERM this leaves a gap where user connections can be sent to a now-unresponsive socket. The usual workaround for this is to add a preStop
hook which runs a short sleep, as Endpoints are updated before the preStop
runs but the SIGTERM isn't delivered until after it completes. Adding 1-2 second sleep ensures the network components have time to process before the socket closes up shop. Up until now this has meant using one of the two modes that container lifecycle hooks offer, either an HTTP GET to an endpoint that doesn't respond for a seconds or exec'ing a sleep binary (or similar shell command) that already exists inside the container. This PR adds a much easier option, a built-in Sleep
action that doesn't require coordinating support inside the container. This in turn makes it much easier to roll out this mitigation across all Pods in your clusters.
#121016: KEP-4008: CRDValidationRatcheting: Ratchet errors from CEL expressions if old DeepEqual
While Kubernetes supports strong versioning for API changes, we've always tried to minimize that by using non-disruptive schema change techniques as much as possible. In many controllers this has meant that when we add new validation rules, we only apply them to existing objects if a relevant field is changed. Or in simpler terms, an already-applied object should continue to kubectl apply
even with new validation rules. This is commonly called "ratcheting" as new objects and changes to existing objects will need to adhere to the new rules (tightening the ratchet) without disrupting all existing objects simultaneously. This PR adds that capability to CEL-based custom type validations. More generally, any existing object fields that aren't changed by a request will not get run through CEL validations. This should also help reduce CPU usage by kube-apiserver for running CEL evaluations. There is future work under the heading of "Advanced Ratcheting", allowing yet more control for cases where new validations should apply even to existing objects, though as a workaround for now you can use validation expressions with the oldSelf
variable to implement your own logic to enable this.
KEP of the Week
KEP 3673 - Kubelet limit of parallel image pulls
This KEP proposes adding a node level limit to the kubelet for the number of parallel image pulls. Currently the kubelet limits image pulls with QPS and burst. This is not ideal since it only limits the number of requests sent to the container runtime and not the actual amount of parallel image pulls going on. Even if a small QPS is set, the number of parallel image pulls in progress could be high. This KEP proposes adding a maxParallelImagePulls
configuration to the kubelet to limit the maximum number of images being pulled in parallel. Any image pull request once the limit has been hit would be blocked until an existing one finishes.
This KEP is authored by Ruiwen Zhao and Paco Xu and is targeting beta
stage in the upcoming v1.29 release.
Other Merges
- KEP 2681, adding
status.HostIPs
, is moving back to Alpha status after failing e2e testing - The APIserver supports only JSON, YAML, and Protobuf
- The
kube-apiserver
will now expose four new metrics to inform about errors on the clusterIP and nodePort allocation logic - QueueingHint function now has new statuses that allows simplified logic in the Scheduler, and NodeAffinity generates queuing hints
- Implement MatchLabelKeys in PodAffinity
- Other new metrics:
job_finished_indexes_total
- The list of metric labels can be configured by supplying a manifest using the
--allow-metric-labels-manifest
flag - Add
--authorization-config
flag to APIserver for better control of when to use Structured Authorization - HPA should calculate the cost of sidecars
- WatchList data consistency checks run only during testing, not in production
- Add
CAP_NET_RAW
access to netadmin debug profile - Delete a CRDs APIServer path when the CRD goes away
- TCPv4 sysctls controlling keepalives and FIN timeouts are now available to control on a per-pod basis
- Use Patch to update pod disruption conditions, eliminating a "cannot delete pod" bug; backported
- ValidatingAdmissionPolicySpec variables can be omitempty
- Fix bug in EventPLEG
- Don't default fields to
{}
if it breaks them - Prevent accidental StatefulSet pod deletion during rolling update
- If PodSchedulingContext updates conflict, use Server-Side Apply
- Clean up DRA prepare/drop resources workflow, including making sure that plugins register themselves
- Remember not to replace undefined resources with empty
- Calculate image counts better for ImageLocality
Testing updates: kubeadm bootstrapping, sig-apps tests, userns, eviction manager
Promotions
- BackOffLimitPerIndex to Beta
- JobReadyPods to Beta
- Plugin resolution to Beta
- Interactive delete to Beta
Deprecated
- Remove feature gates for GA features: SeccompDefault, TopologyManager
- Stop using the
CRON_TZ
orTZ
value for Cronjobs; usespec.TimeZone
field instead
Version Updates
- Kubernetes is now built with Go 1.21.2!