Linux ioc_timer_fn iocost定时器与hweight更新

Linux ioc_timer_fn iocost定时器与hweight更新 Linux ioc_timer_fn iocost定时器与hweight更新ioc_timer_fn是iocost控制器的周期性定时器处理函数它以固定间隔(默认为64ms)执行负责iocost的多个核心维护任务更新iocg的hweight(层级权重)、调整I/O带宽配额、处理过期的等待队列以及触发成本模型的重新校准。该定时器是iocost状态机运转的节拍器。定时器初始化与调度iocost定时器在ioc结构体初始化时通过timer_setup注册每次到期后根据系统负载情况自适应调整下次触发时间。cstatic void ioc_timer_fn(struct timer_list *timer){struct ioc *ioc from_timer(ioc, timer, timer);struct ioc_gq *iocg;unsigned long expires;u64 now, next, vtime;int nr_shortages 0, nr_lagging 0;LIST_HEAD(hlist);now ktime_get();vtime now - ioc-period;/** 检查当前周期是否结束若结束则重置周期计数器* 并刷新全局vtime基准*/if (vtime ioc-period_us * NSEC_PER_USEC) {ioc-period ioc-period_us;vtime now - ioc-period;ioc-vtime_base ioc-period_us * ioc-vtime_rate / 100;}/** 遍历所有iocg更新其hweight、检查是否缺乏* quota或处于滞后状态*/list_for_each_entry(iocg, ioc-active_iocgs, active_list) {u64 hweight, usage;bool shortage false;/** hweight的层级更新根据iocg在cgroup树中的* 位置和其子节点权重重新分配当前层的权重*/hweight iocg-weight;if (iocg-level 0) {struct ioc_gq *parent iocg-parent;if (parent parent-child_weights)hweight parent-hweight * iocg-weight/ parent-child_weights;}iocg-hweight hweight;/** 计算该iocg的实际使用率(usage)* 用于后续quota调整决策*/usage iocg-usage_delta * USEC_PER_SEC/ (now - iocg-usage_timestamp);iocg-usage_timestamp now;iocg-usage_delta 0;/** 如果usage超过配置的限额标记为短缺*/if (usage iocg-max_usage) {shortage true;nr_shortages;}}hweight层级权重计算hweight(hierarchical weight)是iocost实现层级I/O带宽分配的核心概念。每个iocg的hweight由其父节点的hweight和自身在兄弟节点中的权重占比共同决定确保cgroup树中每一层的带宽分配符合预期比例。cstatic void ioc_refresh_hweights(struct ioc *ioc){struct ioc_gq *iocg;u64 total_child_weight 0;/** 第一遍扫描统计每个iocg的子节点总权重* 如果子节点没有设置权重则继承父节点权重*/list_for_each_entry(iocg, ioc-active_iocgs, active_list) {struct ioc_gq *child;u64 child_weight 0;list_for_each_entry(child, iocg-children, sibling_list) {child_weight child-weight;}iocg-child_weights child_weight ?: iocg-weight;}/** 第二遍扫描自顶向下计算hweight。* 根iocg的hweight为1.0(用固定点表示法)*/list_for_each_entry(iocg, ioc-active_iocgs, active_list) {if (iocg-parent) {u64 sibling_total iocg-parent-child_weights;if (sibling_total)iocg-hweight iocg-parent-hweight * iocg-weight/ sibling_total;elseiocg-hweight iocg-parent-hweight;} else {iocg-hweight HWEIGHT_ONE; /* 根节点 */}}}quota分配与vtime预算定时器处理函数还负责基于hweight为每个iocg分配I/O时间预算(vtime)。vtime是iocost中衡量I/O资源消耗的虚拟时间单位。cstatic void ioc_distribute_vtime(struct ioc *ioc, struct list_head *hlist){struct ioc_gq *iocg, *tiocg;u64 vtime, total_hweight 0;u64 vtime_per_cycle;vtime_per_cycle ioc-period_us * ioc-vtime_rate / 100;/** 统计所有活跃iocg的hweight总和*/list_for_each_entry(iocg, hlist, hweight_list) {total_hweight iocg-hweight;}if (!total_hweight)return;/** 根据每个iocg的hweight占比分配vtime预算* vtime_budget vtime_per_cycle * iocg-hweight / total_hweight*/list_for_each_entry(iocg, hlist, hweight_list) {u64 budget vtime_per_cycle * iocg-hweight / total_hweight;iocg-vtime_budget budget;/** 如果iocg的vtime消耗已经超过预算* 将其加入延迟队列以限制I/O*/if (iocg-vtime_used iocg-vtime_budget) {iocg-debt iocg-vtime_used - iocg-vtime_budget;iocg-vtime_budget 0;}}}定时器重调度在所有维护工作完成后ioc_timer_fn根据当前系统负载和iocg的短缺情况调整下次定时器触发的间隔。c/** 根据短缺和滞后情况决定下次调度间隔*/if (nr_shortages) {/** 如果存在短缺缩短间隔以更快响应*/expires msecs_to_jiffies(IOC_TIMER_INTERVAL_MS / 2);} else if (nr_lagging) {expires msecs_to_jiffies(IOC_TIMER_INTERVAL_MS * 3 / 4);} else {expires msecs_to_jiffies(IOC_TIMER_INTERVAL_MS);}mod_timer(ioc-timer, jiffies expires);out_unlock:spin_unlock_irq(ioc-lock);}通过ioc_timer_fn的定期执行iocost实现了对块I/O资源的动态调度——hweight机制保证了cgroup间按比例分配带宽vtime预算控制防止了单个cgroup过度使用I/O资源自适应定时器间隔则在高负载时提供更快的响应。