Weekly GitHub Report for Kubernetes: April 14, 2025 - April 21, 2025 (12:00:58)
Weekly GitHub Report for Kubernetes
Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.
Table of Contents
I. News
1.1 Recent Version Releases:
The current version of this repository is v1.32.3
1.2 Version Information:
The version release on March 11, 2025, introduces key updates to Kubernetes, as detailed in the changelog, with additional binary downloads available. Notable highlights or trends can be found in the Kubernetes announcement forum and the linked changelog.
II. Issues
2.1 Top 5 Active Issues:
We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.
-
kubelet tries to remove pod multiple times(reopen): This issue involves the kubelet attempting to remove a pod multiple times during the deletion process, resulting in an error status being displayed temporarily. The problem seems to be related to a previous unresolved issue, and it involves the container runtime returning a "not found" error when a second delete request is made after the first one succeeds.
- The comments discuss the inability to reproduce the issue locally, potential causes related to containerd versions, and the handling of SIGTERM signals by containers. It is suggested that the issue might be due to a race condition in the pod lifecycle, and a PR is planned to address this. The discussion also clarifies that the error status is not directly caused by the
RemoveContainer()
function but rather by the container runtime's status reporting. - Number of comments this week: 11
- The comments discuss the inability to reproduce the issue locally, potential causes related to containerd versions, and the handling of SIGTERM signals by containers. It is suggested that the issue might be due to a race condition in the pod lifecycle, and a PR is planned to address this. The discussion also clarifies that the error status is not directly caused by the
-
Conformance Endpoints Promoted Without Tests and Untested Conformance Endpoints for 1.33.0: This issue highlights that several endpoints in the Kubernetes 1.33.0 release were promoted without being tested, as identified by APISnoop, a tool for tracking conformance progress. The endpoints in question include various networking-related functions such as creating and deleting IP addresses and ServiceCIDRs, which have not undergone testing despite their promotion.
- The comments discuss the lack of tests for certain networking endpoints and reference previous efforts to address similar issues. Contributors mention existing tests that cover some of the endpoints and suggest adding more tests to ensure coverage. There is also a discussion about the process for promoting endpoints to General Availability (GA) and the need for conformance tests before merging, with contributors expressing a desire for a more proactive approach to prevent such oversights in the future.
- Number of comments this week: 10
-
Garbage collector deletes dependents of StatefulSet, when user repeats the recreation of StatefulSet quickly: This issue describes a problem where the garbage collector in Kubernetes occasionally deletes the dependents of a StatefulSet when the StatefulSet is rapidly deleted and recreated using the
orphan
propagation policy. The expected behavior is that the dependents should not be deleted, and the garbage collector should respect theorphan
policy, but a race condition seems to cause unintended deletions.- The comments discuss a potential race condition in the garbage collection process, where the garbage collector may delete Pods due to missing owner references during rapid StatefulSet recreation. Contributors suggest examining related issues and potential fixes, with some pointing to specific code sections and others questioning if this is a duplicate of a known issue.
- Number of comments this week: 9
-
PVC expansion using envtest.Environment fails: This issue involves a problem encountered while attempting to expand a PersistentVolumeClaim (PVC) from 10Gi to 20Gi using the envtest.Environment in a Kubernetes operator test setup. The error arises because the PVC's spec is immutable after creation, except for certain fields, and the PVC must be in a 'Bound' state to allow updates, which is not the case in the current setup.
- The comments discuss whether the issue is related to the controller-runtime or Kubernetes core, with suggestions to file a bug in the appropriate repository. It is clarified that the PVC must be in a 'Bound' state for updates, and a working code example is provided to simulate the PVC expansion. The discussion concludes with an agreement that the error message could be clearer about the requirement for the PVC to be bound.
- Number of comments this week: 9
-
Apply best practices to staging repos: This issue involves applying best practices to the staging repositories within the Kubernetes project, specifically by updating README files, disabling issues on certain GitHub repositories, and setting up blocks on pull requests to ensure contributions are directed to the correct source. The goal is to streamline the contribution process and maintain consistency across the staging repositories listed in the issue.
- The comments discuss assigning the issue to a contributor, who is advised that assignment is not necessary to work on it. The contributor is guided to consolidate changes into single pull requests for efficiency, especially due to an ongoing code freeze. The contributor confirms the creation of consolidated pull requests for README updates, disabling GitHub issues, and setting up blockades, with plans to proceed post-freeze.
- Number of comments this week: 9
2.2 Top 5 Stale Issues:
We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.
- Run TestUpdateNominatedNodeName integration test with SchedulerPopFromBackoffQ feature enabled: This issue involves running the TestUpdateNominatedNodeName integration test with the SchedulerPopFromBackoffQ feature enabled, which is currently disabled in the test despite being enabled by default in Kubernetes. The challenge lies in the fact that enabling this feature makes it difficult to keep a pod in the backoff queue, as it would be quickly removed, complicating the verification process of the test.
- Reduce the risk of waiting on the scheduling queue Pop(), despite having pods in backoffQ: This issue addresses a potential inefficiency in the Kubernetes scheduler where, despite having pods in the backoff queue (backoffQ), the system might unnecessarily wait due to a timing discrepancy between checking the queue's emptiness and waiting on a condition. The problem arises because the backoffQ is managed with a different lock than the active queue (activeQ), which can lead to a situation where a pod is added to the backoffQ after the emptiness check but before the condition wait, potentially causing delays in scheduling. Since there were fewer than 5 open issues, all of the open issues have been listed above.
2.3 Open Issues
This section lists, groups, and then summarizes issues that were created within the last week in the repository.
Issues Opened This Week: 26
Summarized Issues:
- Kubernetes Conformance Testing: Several endpoints in the Kubernetes 1.33.0 release were promoted to conformance without adequate testing, as identified by APISnoop. A total of 17 endpoints, such as
createNetworkingV1IPAddress
anddeleteNetworkingV1ServiceCIDR
, lack tests, alongside three endpoints from the 1.32.0 release related toCoreV1NamespacedPodResize
that also remain untested, prompting discussions on improving the process to ensure sufficient conformance tests are in place before promotion to General Availability (GA).
- Kubernetes Garbage Collector Issues: A bug in the Kubernetes garbage collector can mistakenly delete StatefulSet's dependents due to a race condition when using the
orphan
propagation policy. Additionally, unexpected GET requests are made for Pods and Deployments after deletion, raising concerns about resource management.
- Kubernetes Test Failures: The Kubernetes project is experiencing several test failures, including
TestStreamTranslator_ThrottleReadChannels
due to missing "X-Stream-Protocol-Version" and a test for eviction timing out due to a deficiency in the fake.ClientSet. These issues highlight the need for improved test reliability and error handling.
- PersistentVolumeClaim Expansion Issues: Expanding a PersistentVolumeClaim (PVC) from 10Gi to 20Gi fails when the PVC is in a
Pending
state rather thanBound
, as required by Kubernetes validation rules. This issue highlights the need for proper state management during PVC updates.
- Kubelet Process Restart Issue: Restarting the kubelet process temporarily resets the pod status to 0/1, causing the pod to be unable to receive traffic until the ready probe restores the status to 1/1. This behavior is not expected and can disrupt service availability.
- Pod Termination and Resource Management Issues: Pods in a Kubernetes cluster running on a custom Linux environment with K3s get stuck in a terminating state due to a "device or resource busy" error during volume unmounting. Additionally, the introduction of the sidecar feature leads to resource allocation failures.
- Performance and Scheduling Concerns: Performance degradation occurs in latency-sensitive services due to the loss of CPU affinity when guaranteed QoS Pods scale down. Moreover, modifying the Kubernetes scheduler to return "UnschedulableAndUnresolvable" could improve scheduling performance in large clusters.
- NamespaceName JSON Tags: Adding JSON tags to the
types.NamespaceName
in the Kubernetes project is necessary to facilitate its use in custom resource definitions. The absence of these tags necessitates redundant type definitions across various projects.
- Pod Eviction and Zonal Disruption: During a Full Zonal Disruption, the system sets the eviction rate limiter to
HealthyQPSFunc
, leading to the eviction of all workloads in the affected zone. Pausing all pod evictions and setting the rate limiter toReducedQPSFunc
is necessary to prevent this.
- Staging Repositories Best Practices: Applying best practices to the staging repositories within the Kubernetes project involves updating README files, disabling issues in the staging mirrors, and setting up blockades to redirect pull requests back to the source repository.
- Memory Eviction Thresholds: Adjusting the default memory eviction-hard threshold in Kubernetes to be greater than the
vm.min_free_kbytes
value is proposed. This dynamic calculation ensures Kubernetes handles memory eviction before the Out-Of-Memory (OOM) killer intervenes.
- CVE-2025-22871 Vulnerability Discussion: The Kubernetes project is discussing whether it is impacted by the CVE-2025-22871 vulnerability. Determining if an update to the Go version is necessary based on the potential impact is crucial for maintaining security.
- CustomResourceDefinitions Conflict: A test failure occurs due to a conflict during the creation of CustomResourceDefinitions (CRDs) when attempting to create multiple CRDs of the same group and version but different kinds. This results in a "CRD already exists" error.
- Security Concerns with Privilege Escalation: Setting
allowPrivilegeEscalation: false
in a container's security context does not prevent all forms of privilege escalation. This creates a false sense of security and prompts a proposal to extend validation to block all dangerous capabilities.
- Permission Issues on Windows ServerCore/NanoServer: The
ContainerUser
is unable to create files at the root of a CSI-mounted volume due to access denial on Windows ServerCore/NanoServer. This issue does not occur with users having higher permissions or in Linux containers.
- TopologySpreadConstraints Misconfiguration: The
topologySpreadConstraints
withwhenUnsatisfiable: DoNotSchedule
is not distributing pods across nodes as expected. This results in all pods being scheduled on a single node, indicating a potential misconfiguration or limitation.
- StartupProbe Warning Events: The startupProbe in Kubernetes is incorrectly reporting initial failed check attempts as "Warning" events. It should only report a warning if the probe exceeds its maximum lifespan without any successful attempts.
- kubectl Command Security Concern: Executing
kubectl exec -it resource asd asd asd -- bash
does not raise an error and behaves likekubectl exec -it resource -- bash
. This improper handling of the--
argument can lead to unexpected command behavior.
- In-Place Memory Resizing for QoS Pods: Supporting in-place memory resizing for guaranteed QoS pods with a static memory policy is proposed. This functionality should be enabled independently of static CPU policy support, ensuring the QOS class remains unchanged.
- Flaking Test in sig-network Group: The "Networking Granular Checks: Services should update endpoints: http" test is intermittently failing in the "master-blocking" job. Failures occur due to unexpected responses when dialing endpoints, linked to a previous similar issue.
- API Streaming Test Failure: A failing end-to-end test related to the API Streaming (WatchList) feature occurs due to incorrect assumptions about feature gate settings. Proper tagging and possibly new decorators are required to resolve the problem.
- NodeResizeError with CephFS CSI Driver: A persistent NodeResizeError condition occurs when using the CephFS CSI driver, which lacks NodeExpandVolume support. This results in a cosmetic error that remains even after a successful PVC resize.
2.4 Closed Issues
This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.
Issues Closed This Week: 10
Summarized Issues:
- Pod Resource Resize Verification: This issue pertains to a failure in verifying the resize state of pod resources in a Kubernetes environment. It highlights the challenge of accurately tracking pod restart counts during the resize operation, identified as a flake and failing test, with suggestions to reduce flakiness by checking the restart count increment rather than the absolute count.
- File Copy Performance Degradation: This issue describes a problem where file copy operations in a Kubernetes pod using a Local PersistentVolume (PV) become significantly slower after a node reboot. The performance only returns to normal upon deleting and recreating the pod.
- Flaking Test in e2e Suite: This issue involves a flaking test in the Kubernetes project where the e2e suite's instrumentation metrics test fails to gather metrics from the kubelet's /metrics/resource endpoint. The failure is due to an "Invalid Kubelet port 0" error, affecting multiple CI jobs and prompting discussions on handling such errors without failing the entire test suite.
- Node Taint Overwriting: This issue describes a problem where a user-added taint on a Kubernetes node is lost after Kubernetes adds its own "not ready" taints. This is potentially due to a caching issue that results in the patch action overwriting the taints list without considering the latest state, raising questions about whether this behavior is a bug.
- Cluster Upgrade to Dual Stack: This issue addresses a problem in the Kubernetes project where upgrading a cluster from a single to a dual stack configuration using the
--service-cluster-ip-range
option results in a breaking change. The current ServiceCIDR implementation requires manual intervention for migration, and the goal is to allow this transition without manual steps to ensure seamless upgrades.
- Networking Endpoints Promotion Without Tests: This issue highlights the promotion of several networking endpoints related to ServiceCIDR and IPAddress in Kubernetes version 1.33.0 without accompanying tests. Concerns are raised about the lack of testing for these endpoints as they advance to conformance status.
- Network Connectivity Loss After Upgrade: This issue describes a problem where, after upgrading the control plane to version 1.29 and Calico to v3.28.2, certain pods on a few worker nodes lost network connectivity. The issue was due to a conflict between iptables-legacy and iptables-nft, which was resolved by reinstalling and flushing iptables-legacy.
- Cluster Initialization Failure on Ubuntu 22.04: This issue describes a problem where the user is unable to successfully initialize a Kubernetes cluster on Ubuntu 22.04 using the
kubeadm init
command. Errors such as "failed to get node info" and "CrashLoopBackOff" for thekube-apiserver
andetcd
containers suggest issues with node registration and container restarts.
- Failing Test Due to Missing Image: This issue pertains to a failing test in the Kubernetes project, specifically within the "ci-kubernetes-e2e-kubeadm-kinder-latest-on-1-33.Pod" job, which is part of the master-informing suite. The failure was due to a missing image in the test configuration, but it was resolved after a related pull request was merged.
- Kubelet ReadOnlyPort Configuration Issue: This issue addresses the inability to set the Kubelet's ReadOnlyPort to 0 using the golang KubeletConfiguration struct due to the
omitempty
JSON field tag. This causes the field to be omitted when marshaled to YAML or JSON, resulting in the port defaulting to 10255 instead of being disabled as intended.
2.5 Issue Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.
III. Pull Requests
3.1 Open Pull Requests
This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Opened This Week: 61
Key Open Pull Requests
1. [WIP][FG:InPlacePodVerticalScaling] Remove CPUs based on the mustKeepCpus which get from container by function RunInContainer: This pull request addresses a bug and API change in the Kubernetes project by proposing a solution to issue #131309, which involves using the RunInContainer
function to obtain the mustKeepCpus
from a container, ensuring that these CPUs include the promised CPUs as implemented in a previous pull request, while also making several code improvements and fixes related to static CPU management and InPlacePodVerticalScaling.
- URL: pull/131331
- Merged: No
2. Run all permutations of input events in ResourceSlice tracker tests: This pull request enhances the ResourceSlice tracker tests by implementing all possible permutations of input events, increasing the number of test cases from 16 to 185, to ensure consistent output across different event sequences, inspired by the device taint eviction controller tests.
- URL: pull/131279
- Merged: No
3. DRA: work around fake.ClientSet informer deficiency in unit test: This pull request addresses a race condition issue in the fake.ClientSet
related to informers in Kubernetes unit tests by implementing a workaround that ensures all watches are in place before proceeding, thereby fixing a bug and reducing test flakiness.
- URL: pull/131344
- Merged: No
Other Open Pull Requests
- ResourceSlice Mixins Implementation: This topic involves the introduction of a new feature and API change to the Kubernetes project by implementing ResourceSlice mixins. The pull request includes several commits that add a feature gate, define types, run updates, and implement validation for these mixins.
- MultiCIDRServiceAllocator and DisableAllocatorDualWrite: This topic covers the locking of the MultiCIDRServiceAllocator to its default setting and the promotion of the DisableAllocatorDualWrite feature to General Availability (GA). The pull request disables dual writes, serves the v1 version from storage by default, and tests documented admission policies for serviceCIDRs.
- Etcd Logging Improvements: This topic focuses on improving the logging mechanism of etcd by intercepting, parsing, and reformatting its output to enhance readability and integration with test outputs. The pull request also addresses issues such as port conflicts by using Unix Domain sockets and reduces shutdown delays by fixing the 5-second wait during the termination process.
- Staging Repositories Documentation Update: This topic involves updating the
README.md
files across various Kubernetes staging repositories to clarify their purpose. The pull request emphasizes that all contributions, including issues and pull requests, should be directed to the main Kubernetes repository, as part of an initiative to enforce best documentation practices.
- Code Refactoring and Cleanup: This topic includes several pull requests aimed at refactoring and cleaning up the codebase. These changes enhance readability, maintainability, and consistency without altering functionality, such as refactoring the
oomScoreAdjust
formula and unifying string concatenation patterns.
- Bug Fixes: This topic covers multiple pull requests addressing various bugs in the Kubernetes project. These fixes include parsing log verbosity flags, preventing multiple
RemoveContainer
calls, and ensuring correct file precedence inkubectl
configuration modifications.
- Test Stabilization and Flakiness Reduction: This topic involves efforts to stabilize tests and reduce flakiness in the Kubernetes project. Pull requests include adding logic to wait for memory-pressure taint clearance and addressing flaky behavior in integration tests.
- Feature Promotion and API Updates: This topic includes the promotion of features and updates to APIs within Kubernetes. Pull requests cover the promotion of the ExternalJWTSigner feature to beta and the introduction of the v1 version of the externaljwt API proto.
- Build Process and Version Management: This topic focuses on streamlining the build process and managing versioning in Kubernetes. The pull request reduces redundancy by reading the Go version from a single file instead of duplicating it across multiple locations.
- Miscellaneous Bug Fixes and Improvements: This topic includes various bug fixes and improvements across the Kubernetes project. Pull requests address issues such as race conditions, typo corrections, and resource claim errors.
3.2 Closed Pull Requests
This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.
Pull Requests Closed This Week: 25
Key Closed Pull Requests
1. "Fix spelling and consistency issues in README.md: This pull request addresses spelling and consistency issues in the README.md file of the Kubernetes project, involving multiple updates and merges to ensure clarity and uniformity in the documentation.
- URL: pull/131302
- Merged: No
- Associated Commits: 734d1, fecf8, 9cdb7, 07c99, b84c1, b67a5, 9f985, 6e3bd, 5952b, a8a18, d90a2, 18454, e3681, d4496, 407ba, d087d, 2d946, d5981, 66e0b
2. fix flaky garbage collector tests: This pull request addresses flaky garbage collector tests in the Kubernetes project by ensuring end-to-end tests do not assume that 100% of allocatable pods are consumable, marking these tests as Serial to prevent concurrent execution of multiple tests that heavily utilize pods, and fixing the related issue #124369.
- URL: pull/131211
- Merged: 2025-04-14T15:19:06Z
3. Documnetation: This pull request involves updates to the documentation files, specifically CONTRIBUTING.md
and SUPPORT.md
, in the Kubernetes project, as indicated by the commit messages and the nature of the changes.
- URL: pull/131305
- Merged: No
Other Closed Pull Requests
- Namespace Default Variable Update: Several pull requests aimed to replace the hardcoded 'default' namespace string with a variable 'NamespaceDefault' in the Kubernetes codebase. These changes were intended to enhance code consistency and maintainability, although some were not merged.
- Race Condition Fix in Kube-apiserver: Multiple pull requests addressed a race condition in the Kubernetes kube-apiserver that could cause it to emit further watch events even if decryption failed for an earlier event. These fixes were applied to different release branches to ensure proper error handling.
- ServiceCIDR and Dual-Stack Configuration: A pull request modified the validation logic for ServiceCIDR updates to allow conversion from single-stack to dual-stack configurations. This change ensures existing Service IP allocations remain unaffected while disallowing riskier operations.
- Security Vulnerability Fix: A pull request updated the
golang.org/x/net
package to versionv0.38.0
to address security vulnerabilities CVE-2025-22870 and CVE-2025-22872. This update was part of a cleanup effort in the Kubernetes project.
- Bug Fixes in Kubernetes Components: Several pull requests addressed various bugs in Kubernetes components, including issues with CSI volume unmounting, pod-level resource validation, and the garbage collector's race condition. These fixes aimed to improve stability and prevent unexpected behavior.
- Test and Debugging Enhancements: Some pull requests were created for testing and debugging purposes, such as evaluating the Horizontal Pod Autoscaler behavior and debugging the 'pull-kubernetes-e2e-gce' test. These were not intended to be merged.
- Cleanup and Code Consistency: Various pull requests focused on cleanup tasks and enhancing code consistency, such as fixing typos, updating comments, and ensuring uniform application of changes across the codebase. These efforts contribute to better code maintainability.
- Feature Enhancements and Updates: A pull request introduced a new
--subresources
flag to thekubectl set resources
command, enabling resource updates for running pods using theInPlacePodVerticalScaling
feature gate. This enhancement aligns with Kubernetes Enhancement Proposals.
- WatchList Feature Adjustment: A pull request disabled the beta WatchList feature by default in Kubernetes version 1.33, favoring other features like StreamingCollectionEncodingToJSON. This change prevents reliance on a potentially unpromoted feature.
- Bug Fix in
kubectl exec
Command: A pull request addressed a bug in thekubectl exec
command by implementing a check for extra arguments between the resource name and the dash separator. This fix prevents unexpected behavior and raises an error if such arguments are detected.
- Control Plane Certificate Renewal: A pull request stopped recommending users to restart the control plane upon certificate renewal, as all components already reload certificates as needed. This change prevents unnecessary and potentially disruptive actions.
- Flaky Test Issue Resolution: A pull request addressed a flaky test issue by ensuring consistent metrics collection from the kubelet /metrics/resource endpoint. This fix resolves issue #131229 in the Kubernetes project.
3.3 Pull Request Discussion Insights
This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.
Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.
IV. Contributors
4.1 Contributors
Active Contributors:
We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.
If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.
Contributor | Commits | Pull Requests | Issues | Comments |
---|---|---|---|---|
BenTheElder | 13 | 6 | 4 | 105 |
pohly | 17 | 7 | 5 | 51 |
liggitt | 17 | 6 | 0 | 50 |
aojea | 9 | 6 | 8 | 49 |
dims | 5 | 0 | 7 | 30 |
danwinship | 12 | 1 | 0 | 21 |
tabbysable | 0 | 0 | 5 | 29 |
carlory | 9 | 8 | 0 | 16 |
HirazawaUi | 12 | 4 | 1 | 15 |
sanposhiho | 1 | 1 | 1 | 29 |