Weekly Project News

Subscribe
Archives

Weekly GitHub Report for Kubernetes: September 29, 2025 - October 06, 2025 (12:05:14)

Weekly GitHub Report for Kubernetes

Thank you for subscribing to our weekly newsletter! Each week, we deliver a comprehensive summary of your GitHub project's latest activity right to your inbox, including an overview of your project's issues, pull requests, contributors, and commit activity.


Table of Contents

  • I. News
    • 1.1. Recent Version Releases
    • 1.2. Other Noteworthy Updates
  • II. Issues
    • 2.1. Top 5 Active Issues
    • 2.2. Top 5 Stale Issues
    • 2.3. Open Issues
    • 2.4. Closed Issues
    • 2.5. Issue Discussion Insights
  • III. Pull Requests
    • 3.1. Open Pull Requests
    • 3.2. Closed Pull Requests
    • 3.3. Pull Request Discussion Insights
  • IV. Contributors
    • 4.1. Contributors

I. News

1.1 Recent Version Releases:

The current version of this repository is v1.32.3

1.2 Version Information:

The Kubernetes 1.32 release, announced on March 11, 2025, introduces several key updates and improvements detailed in the official CHANGELOG, with additional binary downloads available. This version continues to enhance Kubernetes' functionality and stability, reflecting ongoing development trends in the platform.

II. Issues

2.1 Top 5 Active Issues:

We consider active issues to be issues that that have been commented on most frequently within the last week. Bot comments are omitted.

  1. [Flaking Test] [sig-scalability] Collecting pod startup latency [00] - PodStartupLatency: This issue reports flakiness in the PodStartupLatency test within the master-informing job, specifically highlighting that pod startup latency at the 99th percentile is exceeding the expected threshold. The discussion centers around identifying causes for this increased latency, with attention to recent changes in the kOps project, including updates to the AWS VPC CNI and an etcd version bump, which may have contributed to the observed performance degradation.

    • The comments analyze the flakiness by comparing latency patterns across different environments, noting that the issue appears specific to EC2. Contributors link the latency increase to recent kOps changes, particularly the etcd upgrade, and discuss performance metrics and potential impacts, while also proposing further testing with a newer etcd version to mitigate the problem.
    • Number of comments this week: 10
  2. API Server returns panic message to caller when Creating CronJob with Some Invalid Schedule Format: This issue reports that the Kubernetes API server panics with a runtime error when a CronJob is created using an invalid schedule format, specifically due to a slice bounds out of range error in the robfig/cron parser. The expected behavior is for the API server to return a proper validation error instead of panicking, preventing the panic message from being returned to the client.

    • The comments include assignment and triage actions, clarification that the API server recovers from the panic without exiting, discussion about relying on an upstream fix in the robfig/cron library versus handling the error internally in Kubernetes, and coordination among contributors regarding issue ownership.
    • Number of comments this week: 9
  3. apiserver: stats.go Error getting keys. Too large resource version: This issue describes frequent "Error getting keys" timeout errors related to "Too large resource version" observed in a clean Kubernetes 1.34.1 cluster installation, indicating that the apiserver's watch cache is lagging behind the current resource version. The user reports that disabling the SizeBasedListCostEstimate feature gate suppresses these errors, and extensive troubleshooting shows consistent read timeouts and cache consistency check errors despite healthy etcd metrics and no obvious etcd issues.

    • The comments discuss that the error stems from watch cache timeouts on consistent reads rather than a direct bug in the feature gate, with requests for etcd version and metrics to diagnose latency and read consistency. The user confirms etcd 3.5.6 is used and that the issue persists even with minimal cluster components. Metrics comparisons between Kubernetes versions 1.33 and 1.34 reveal increased watch cache read wait times in 1.34. Attempts to gather relevant metrics show some are missing, and the maintainers conclude the cluster has broken consistent reads and watch cache issues but lack sufficient debug data to identify a bug, thus reclassifying the issue as a support case rather than a bug.
    • Number of comments this week: 9
  4. ci-kubernetes-e2e-gci-gce-alpha-enabled-default cluster failing to come up: This issue describes a failure in the Kubernetes continuous integration job where the default cluster setup with alpha features enabled is unable to come up successfully. The root cause appears to be related to containerd failing to create containers due to a missing AppArmor parser executable in the system PATH, leading to elevated failure rates since late September.

    • The comments discuss potential solutions such as disabling AppArmor or ensuring the parser utility is installed, note that the utility should be present on the Container-Optimized OS (COS), and consider that intermittent failures might be due to PATH or installation issues rather than the OS itself; suggestions include bumping containerd versions and further investigation with COS maintainers.
    • Number of comments this week: 6
  5. "Services should implement service.kubernetes.io/headless" is broken: This issue addresses a broken end-to-end test for the headless service optimization in Kubernetes, where the test incorrectly assumes that the EndpointSlice controller copies the service.kubernetes.io/headless label from the Service to the EndpointSlice. The problem arises because the EndpointSlice controller enforces the correct label value rather than copying it, causing the test to pass only due to kube-proxy’s overly broad filtering of objects with the headless label, which does not reflect the intended behavior of service proxies that should only ignore EndpointSlices with that label.

    • The discussion clarifies that the test’s failure stems from a misunderstanding of how the EndpointSlice controller manages the headless label compared to the Endpoints controller, with suggestions to update the test to manually create EndpointSlices with the correct label and to revise documentation to better reflect current behavior; participants also note a kube-proxy quirk that masks the test failure and agree that the controller behavior is correct but the test and docs need fixing.
    • Number of comments this week: 5

2.2 Top 5 Stale Issues:

We consider stale issues to be issues that has had no activity within the last 30 days. The team should work together to get these issues resolved and closed as soon as possible.

  1. Zone-aware down scaling behavior: This issue describes a problem with the horizontal pod autoscaler's (HPA) scale-in behavior in a Kubernetes deployment that uses topology spread constraints to evenly distribute pods across multiple zones. Specifically, during scale-in events, the pods become unevenly distributed with one zone having significantly fewer pods than expected, causing high CPU usage on the remaining pod in that zone and violating the intended maxSkew: 1 constraint for pod spreading.
  2. apimachinery's unstructured converter panics if the destination struct contains private fields: This issue describes a panic occurring in the apimachinery's DefaultUnstructuredConverter when it attempts to convert an unstructured object into a destination struct that contains private (non-exported) fields. The reporter expects the converter to safely ignore these private fields instead of panicking, as the current behavior causes failures especially with protobuf-generated gRPC structs that include private fields for internal state.
  3. Integration tests for kubelet image credential provider: This issue discusses the potential addition of integration tests specifically for the kubelet image credential provider, similar to the existing tests for client-go credential plugins. It suggests that since there are already integration tests for pod certificates, implementing similar tests for the kubelet credential plugins would be a logical and beneficial next step.
  4. conversion-gen generates code that leads to panics when fields are accessed after conversion: This issue describes a bug in the conversion-gen tool where it generates incorrect conversion code for structs that have changed field types between API versions, specifically causing unsafe pointer conversions instead of proper recursive conversion calls. As a result, accessing certain fields like ExclusiveMaximum after conversion leads to runtime panics, highlighting the need for conversion-gen to produce safe and correct conversion functions.
  5. Failure cluster [ff7a6495...] TestProgressNotify fails when etcd in k/k upgraded to 3.6.2: This issue describes a failure in the TestProgressNotify test that occurs when the etcd component in the Kubernetes project is upgraded to version 3.6.2. The test times out after 30 seconds waiting on a result channel, with multiple errors indicating that the embedded etcd server fails to set up serving due to closed network connections and server shutdowns.

2.3 Open Issues

This section lists, groups, and then summarizes issues that were created within the last week in the repository.

Issues Opened This Week: 18

Summarized Issues:

  • Goroutine Leaks: Multiple issues describe goroutine leaks in Kubernetes components where goroutines become blocked indefinitely due to waiting on channel operations or select statements without corresponding events. These leaks cause resource leakage and hanging processes, impacting system stability and performance.
  • issues/134322, issues/134323
  • Log Rotation Failures: An issue highlights that the new timestamp-based container log rotation mechanism in Kubernetes v1.27+ fails to rotate logs for high-throughput containers, causing log files to grow indefinitely and rotation tasks to hang despite configured size limits. This results in potential disk space exhaustion and operational issues.
  • issues/134324
  • Leader Election Testing Improvements: A proposal suggests modifying the LeaderElectionConfig struct to include a clock parameter, enabling deterministic testing of leader election by injecting a custom clock implementation. This change aims to improve test reliability and predictability.
  • issues/134331
  • Pod Startup and Load Balancing Issues: Issues report flakiness in pod startup latency tests and request support rules for defining when controller replicas become available to serve traffic. These address load imbalance caused by sequential pod startup and aim to improve readinessGate mechanisms and scalability test reliability.
  • issues/134332, issues/134336
  • Debugging Enhancements: A request is made to add signal handling (e.g., SIGUSR1) in Kubernetes applications to dump call stacks for debugging when controllers appear stuck without logs or pprof enabled. This would facilitate troubleshooting in production environments.
  • issues/134337
  • Job Termination Delays: An issue describes that Kubernetes Jobs with a preStop sleep lifecycle hook cause Pods to remain in a "Terminating" state and Jobs to stay "Running" until the full sleep duration completes, delaying Job completion contrary to expected behavior with exec sleep commands.
  • issues/134338
  • Watch Cache and API Server Errors: Frequent timeout errors occur in the Kubernetes apiserver watch cache due to resource versions lagging behind current versions, causing issues with consistent reads and watch cache performance despite normal etcd operation.
  • issues/134343
  • Container Runtime Failures: A cluster startup failure is caused by containerd being unable to create containers because the apparmor_parser executable is missing from the PATH, blocking core pod creation due to AppArmor profile loading errors.
  • issues/134344
  • Automated Fuzz Testing Requests: There is a request to add automated fuzz testing for conversion roundtripping of dynamically generated API code in dynamic-resource-allocation to verify correctness and prevent regressions.
  • issues/134356
  • Metrics Timestamp Accuracy: The collectNodeSwapMetrics function incorrectly uses memory statistics timestamps instead of swap statistics timestamps, leading to inaccurate node swap usage metrics that should be corrected to align with pod and container swap metrics.
  • issues/134359
  • Scalability Testing Enhancements: A proposal suggests introducing Resource Size as a formal scalability dimension to address outdated assumptions of small pod sizes, aiming to improve cluster stability and user experience by incorporating realistic pod sizes and setting official scalability goals.
  • issues/134375
  • Autoscaling Test Failures: A failing test named TestMultipleHPAs in the autoscaling component receives unexpected Horizontal Pod Autoscaler names, causing test failures since September 2025 due to name mismatches.
  • issues/134386
  • Pod Affinity Scheduling Issues: The PodAffinity QHint mechanism fails to consider existing pods' anti-affinity rules affecting pending pods, causing the inter-pod affinity plugin to miss pod or delete events that could make pending pods schedulable.
  • issues/134393
  • Huge Pages Scheduling Problems: The static memory manager does not verify available huge pages during pod admission, leading to pod scheduling on NUMA nodes with insufficient huge page memory and causing JVM startup failures due to memory allocation errors.
  • issues/134395
  • Test Metadata and Failure Reporting: The capz-windows-master job does not report the tested Kubernetes version, complicating identification of regression causes, and is failing an end-to-end test related to the API machinery SIG with no triage or identified reason.
  • issues/134405, issues/134416

2.4 Closed Issues

This section lists, groups, and then summarizes issues that were closed within the last week in the repository. This section also links the associated pull requests if applicable.

Issues Closed This Week: 6

Summarized Issues:

  • Boolean option support in kubectl aliases: This issue highlights that boolean options like --current work correctly in default kubectl commands but cause errors when used in aliases configured with appendArgs. The lack of support for boolean options in aliases limits user flexibility and causes unexpected failures.
  • issues/134351
  • CI job failures due to infrastructure timeouts: Multiple Kubernetes CI jobs, including the Windows CAPZ job, are failing due to timeouts during VM provisioning on Azure, which causes node bootstrapping to fail or time out. These infrastructure-related issues disrupt testing and delay development progress.
  • issues/134354, issues/134370
  • PersistentVolumeClaim validation changes in StatefulSets: Kubernetes 1.34 enforces controller-managed creation of PVCs in StatefulSets, preventing users from pre-creating or modifying PVCs, which breaks previous workflows relying on invalid templates. This stricter validation requires either denying pre-created PVCs outright or introducing more flexible template specifications, possibly controlled by a feature flag.
  • issues/134357
  • Data race conditions in Kubernetes components: Data races have been detected in both the garbage collector controller and the API server metrics initialization, where concurrent access to internal data structures occurs without proper synchronization. These race conditions can lead to instability, incorrect behavior, and test failures in Kubernetes.
  • issues/134371, issues/134372

2.5 Issue Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed issues that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed issues from the past week.


III. Pull Requests

3.1 Open Pull Requests

This section provides a summary of pull requests that were opened in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Opened This Week: 47

Key Open Pull Requests

1. add +k8s:immutable tag to ResourceClaim.spec and associated tests: This pull request migrates the immutable validation logic on ResourceClaim.spec from a hand-written approach to a declarative one using the +k8s:immutable tag, adds the implementation and associated tests for this tag, introduces a short-circuiting cohort to improve validation flow, and updates strategy.go to remove redundant validation calls during updates.

  • URL: pull/134367
  • Merged: No
  • Associated Commits: 8b794, bf67e, eb8b3, e48a0, b839a

2. feat(validation-gen): add path normalization options & migration k8s:maxItem on ResourceClaimSpec fields: This pull request adds declarative validation by introducing +k8s:maxItems tags to limit the size of various list fields in ResourceClaimSpec, enhances the validation test framework to support normalization rules, and includes normalization rules specifically for ResourceClaim validation to ensure accurate enforcement of these new constraints.

  • URL: pull/134408
  • Merged: No
  • Associated Commits: 73660, 7bbc7, ae8ea, 355fb, 94668

3. feat: Add k8s-extended-resource-name format and validator for DeviceClass: This pull request introduces a new format called k8s-extended-resource-name along with a corresponding format validator for the DeviceClass.ExtendedResourceName field, aiming to enhance declarative validation in Kubernetes by replacing manual validation with a more robust, maintainable, and consistent approach as part of the implementation of KEP-5073.

  • URL: pull/134358
  • Merged: No
  • Associated Commits: 72208, d84de, de04d, 10012

Other Open Pull Requests

  • Persistent Volume Node Affinity Mutability: This pull request introduces the feature that allows the node affinity of Persistent Volumes (PVs) to be mutable, enabling online migration of PVs to different topologies. It also ensures the kubelet rejects pods with mismatched PV node affinity to prevent scheduling race conditions.
    pull/134339
  • Feature Gate Enhancements: Multiple pull requests improve feature gate management by promoting EnvFiles to Beta with restricted syntax, introducing a new function to set multiple feature gates simultaneously during tests to handle dependencies, and fixing feature gate dependencies in integration tests. These changes enhance feature gate usability and reliability in testing and runtime environments.
    pull/134414, pull/134366, pull/134365
  • Pod and Container Lifecycle Improvements: Several pull requests focus on pod and container lifecycle management, including a draft implementation of the RestartPod feature that restarts containers by removing the sandbox, and marking MirrorPod update tests as NodeConformance to validate static pod to mirror pod update paths. These efforts improve pod restart capabilities and test coverage for pod updates.
    pull/134345, pull/134369
  • Kubectl User Impersonation Enhancements: This pull request adds a persistent flag --as-user-extra in kubectl to enable passing extra user arguments during impersonation, addressing missing API and kubeconfig support. This enhancement improves user impersonation capabilities in kubectl commands.
    pull/134378
  • Resource Quota Warnings: A pull request introduces warnings in the resource quota feature to notify users when their resource requests exceed defined limits, effectively taking over an abandoned fix and addressing a specific issue. This improves user awareness and resource management in Kubernetes.
    pull/134389
  • Documentation and Changelog Updates: Multiple pull requests update documentation and changelogs, including adding a new initial table of contents with a Security section and reorganizing introductory sentences in README.md, updating the CRI API changelog for version 1.34, and improving documentation by explaining edge cases in PodTopologySpread filtering tests. These changes enhance clarity and user experience.
    pull/134410, pull/134411, pull/134373
  • Node Resources and Device Management: This pull request adds a global cache for event handler functions of the noderesources plugin to efficiently map device classes to DRA extended resources, addressing a tracked issue. This optimization improves resource handling in node resource management.
    pull/134326
  • EvictionRequest API Introduction: A pull request introduces the EvictionRequest API types for coordination.k8s.io/v1alpha2, adding foundational resources and coordination features behind an alpha feature gate without implementing controller or admission logic. This lays groundwork for advanced eviction coordination.
    pull/134328
  • Windows Volume Operations Migration: This pull request proposes migrating Windows-related volume operations from PowerShell to native WMI to improve performance, aligning with a feature delivered in csi-proxy v1.3.0 and enabling more CSI drivers to benefit. This change enhances Windows volume operation efficiency.
    pull/134329
  • ResourceClaimStatus Uniqueness and Validation: This pull request adds listType=map and listMapKey=uid to the ResourceClaimStatus.ReservedFor field to ensure uniqueness based on uid, updates validation logic to declarative validation, and includes new tests. These changes improve data integrity and validation for resource claims.
    pull/134333
  • Code Cleanup and Typo Fixes: Several pull requests fix typos in variable names and documentation, including correcting 'watchActcion' in unit tests, a misspelling in csi_fsgroup_policy.go documentation, and a typo in an error message related to a feature gate name. These fixes improve code quality and clarity.
    pull/134334, pull/134361, pull/134380
  • Test Image and Pod Resize Updates: Pull requests update base images for test images to latest supported versions and update pod resize tests to accommodate new cpu.weight conversion, ensuring compatibility and security without introducing dependency updates.
    pull/134341, pull/134342
  • Kubelet Log Rotation Bug Fix: This pull request addresses a critical bug in Kubernetes v1.27+ by improving the kubelet's container log rotation mechanism to detect and prioritize high-throughput containers, optimizing rate limiting and retry logic to prevent unbounded log file growth. This ensures timely log rotation without configuration changes.
    pull/134346
  • Scheduler Performance Metrics: A pull request adds a scheduling duration collector to the scheduler_perf component to enable analysis of the entire scheduling phase duration, providing a potentially more useful metric than scheduling throughput. This enhances scheduler performance monitoring.
    pull/134350
  • Utility Function Deprecation and Refactoring: This pull request deprecates obsolete slice utility functions and updates their usage to rely on standard library functions, including a new helper for slices.DeleteFunc, reducing project-specific code maintenance.
    pull/134353
  • Client-Side Patch Segmentation Fault Fix: This pull request fixes a segmentation fault caused by handling nested maps in client-side patches by ensuring PatchMetaFromOpenAPI returns an empty struct instead of nil for unknown nested fields, preventing runtime panics when updating deeply nested JSON objects.
    pull/134381

3.2 Closed Pull Requests

This section provides a summary of pull requests that were closed in the repository over the past week. The top three pull requests with the highest number of commits are highlighted as 'key' pull requests. Other pull requests are grouped based on similar characteristics for easier analysis. Up to 25 pull requests are displayed in this section, while any remaining pull requests beyond this limit are omitted for brevity.

Pull Requests Closed This Week: 26

Key Closed Pull Requests

1. feat(validation-gen): Add "cohorts" & Tighten and simplify test framework: This pull request refactors and tightens the validation-gen test framework by introducing the concept of "cohorts" for grouped validations, making the ErrorMatcher stricter to simplify ratcheting tests, removing unused functions like ExpectInvalid() and ExpectRegexpsByPath(), and fixing field path resolution for embedded fields in root types.

  • URL: pull/134347
  • Merged: Yes
  • Associated Commits: 054ab, a51fb, b922f, 8b08c, 199c9, 89b97, 04d63, 7b938, 5d067, 8105d, efe4d, 975df, b00e0, 7cf99, 229c6, 2d48d, 0a26f, 6b11e, 51f02, 2fd76, 46c15

2. Add resource version comparison function in client-go along with conformance: This pull request introduces a helper function in client-go to compare resourceVersion strings between objects of the same resource and includes extensive conformance tests across numerous Kubernetes resource types to ensure consistent and correct resource version comparability in a conformant cluster.

  • URL: pull/134330
  • Merged: Yes
  • Associated Commits: 2cef5, 37fcf, 7c24e, 84f85, 9757d

3. DRA: ResourceSlice tracker cleanup: This pull request cleans up the ResourceSlice tracker in preparation for a device taint update in Kubernetes version 1.35 by improving test clarity, enhancing log output for better debugging, and reorganizing documentation to facilitate easier understanding of test cases.

  • URL: pull/134340
  • Merged: Yes
  • Associated Commits: 02a51, c36c9

Other Closed Pull Requests

  • Code clarity and documentation fixes: Multiple pull requests improve code clarity by fixing incorrect documentation comments and simplifying method signatures. These changes include correcting comments for ComputeZoneState and removing unused parameters to enhance maintainability.
    [pull/134314, pull/134318]
  • Kubeadm node flags rework and bug fixes: Several pull requests rework the node flags in kubeadm's FetchInitConfigurationFromCluster function to fix bugs related to node registration and node name handling during upgrades. These changes replace problematic flags with more precise ones and ensure accurate node information is fetched, including automated cherry picks for release branches.
    [pull/134319, pull/134362, pull/134363, pull/134364]
  • Codebase cleanup and deprecated feature removal: Some pull requests focus on cleaning up the codebase by removing deprecated packages and obsolete functions. This includes removing grpc.WithBlock() and the unused cpuSharesToCPUWeight function to streamline the code.
    [pull/134325, pull/134409]
  • Test improvements and flake fixes: Pull requests address test reliability by modifying tests to avoid platform-specific issues and fixing flakes. Changes include selecting only well-known secrets in tests and using dedicated channels to prevent test flakiness.
    [pull/134327, pull/134387]
  • API specification and validation updates: A pull request updates the Kubernetes API by marking the ResourceClaim.Effect field as required in the Open API specification, correcting its validation status.
    [pull/134335]
  • Scheduler workload and feature gate adjustments: Automated cherry picks disable too short scheduler_perf workloads to prevent integration test timeouts, and a feature gate for SchedulerAsyncAPICalls is disabled to mitigate a regression affecting scheduler performance under load.
    [pull/134348, pull/134349, pull/134400]
  • Metrics and performance improvements: Multiple pull requests move metrics calculations and statistics update calls into the getList function to reduce code duplication and enable future performance improvements. Additional fixes include correcting comments and fixing data races in metrics-related code.
    [pull/134352, pull/134379, pull/134390, pull/134398]
  • Bug fixes in metrics and runtime handling: Pull requests fix bugs such as incorrect timestamps in node swap metrics and correct the way runtime.Objects are passed in tests to prevent failures.
    [pull/134355, pull/134360]
  • Error matcher enhancement: A pull request adds support for path normalization in the error matcher, allowing more flexible and accurate matching of error fields by normalizing complex path expressions.
    [pull/134368]
  • COS version upgrade for CI stability: One pull request upgrades the Container-Optimized OS version from COS 109 to COS 121 to fix broken continuous integration caused by the removal of older images.
    [pull/134377]

3.3 Pull Request Discussion Insights

This section will analyze the tone and sentiment of discussions within this project's open and closed pull requests that occurred within the past week. It aims to identify potentially heated exchanges and to maintain a constructive project environment.

Based on our analysis, there are no instances of toxic discussions in the project's open or closed pull requests from the past week.


IV. Contributors

4.1 Contributors

Active Contributors:

We consider an active contributor in this project to be any contributor who has made at least 1 commit, opened at least 1 issue, created at least 1 pull request, or made more than 2 comments in the last month.

If there are more than 10 active contributors, the list is truncated to the top 10 based on contribution metrics for better clarity.

Contributor Commits Pull Requests Issues Comments
pohly 31 8 9 78
liggitt 14 5 1 46
BenTheElder 4 0 5 48
macsko 18 9 2 26
aaron-prindle 17 9 1 26
yongruilin 34 4 2 12
huww98 23 6 0 16
lalitc375 17 8 0 19
aojea 3 3 1 34
p0lyn0mial 33 4 0 0

Don't miss what's next. Subscribe to Weekly Project News:
Powered by Buttondown, the easiest way to start and grow your newsletter.