[转帖]vm内核参数之缓存回收drop_caches

vm,内核,参数,缓存,回收,drop,caches · 浏览次数 : 0

小编点评

生成内容时需要带简单的排版,以下内容需带简单的排版: 1. **对象**: ```objects = (*shrinker->shrink)(shrinker, sc);` ``` 2. ****: ```*shrinker = &shrinker;` ``` 3. ****: ```*sc = ≻` ``` 4. ****: ```*nr_to_scan = nr_to_scan;``` 5. ****: ```*shrinker = &shrinker;``` 6. ****: ```return;``` 7. ****: ```*shrinker = &shrinker;``` 8. ****: ```return;```

正文

注:本文分析基于3.10.0-693.el7内核版本,即CentOS 7.4

1、关于drop_caches

通常在内存不足时,我们习惯通过echo 3 > /proc/sys/vm/drop_caches 的方式手动清理系统缓存,

[root@localhost  ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7822        3436        2068          40        2317        3997
Swap:             0           0           0
[root@localhost  ~]# echo 3 > /proc/sys/vm/drop_caches 
[root@localhost ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7822        3433        4036          40         352        4037
Swap:             0           0           0

    对于数字3的含义,我们可以通过内核文档了解其具体含义,

    To free pagecache: 
    	echo 1 > /proc/sys/vm/drop_caches 
    To free reclaimable slab objects (includes dentries and inodes): 
    	echo 2 > /proc/sys/vm/drop_caches 
    To free slab objects and pagecache: 
    	echo 3 > /proc/sys/vm/drop_caches
    

      2、释放pagecache

      在之前我们知道当内存低于某个阈值时,会触发脏页回写,提交回写work到对应BDI设备上,由BDI writebacke进程回写脏页释放内存。这和drop_caches中的echo 1类似,都是释放脏页,因此其最后路径是一致的。

      int drop_caches_sysctl_handler(ctl_table *table, int write,
      	void __user *buffer, size_t *length, loff_t *ppos)
      {
      	int ret;
      
      ret <span class="token operator">=</span> <span class="token function">proc_dointvec_minmax</span><span class="token punctuation">(</span>table<span class="token punctuation">,</span> write<span class="token punctuation">,</span> buffer<span class="token punctuation">,</span> length<span class="token punctuation">,</span> ppos<span class="token punctuation">)</span><span class="token punctuation">;</span>
      <span class="token keyword">if</span> <span class="token punctuation">(</span>ret<span class="token punctuation">)</span>
      	<span class="token keyword">return</span> ret<span class="token punctuation">;</span>
      <span class="token keyword">if</span> <span class="token punctuation">(</span>write<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
      	<span class="token keyword">static</span> <span class="token keyword">int</span> stfu<span class="token punctuation">;</span>
          <span class="token comment">// echo 1 &gt; drop_caches</span>
      	<span class="token keyword">if</span> <span class="token punctuation">(</span>sysctl_drop_caches <span class="token operator">&amp;</span> <span class="token number">1</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
      		<span class="token function">iterate_supers</span><span class="token punctuation">(</span>drop_pagecache_sb<span class="token punctuation">,</span> <span class="token constant">NULL</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      		<span class="token function">count_vm_event</span><span class="token punctuation">(</span>DROP_PAGECACHE<span class="token punctuation">)</span><span class="token punctuation">;</span>
      	<span class="token punctuation">}</span>
          <span class="token comment">// echo 2 &gt; drop_caches</span>
      	<span class="token keyword">if</span> <span class="token punctuation">(</span>sysctl_drop_caches <span class="token operator">&amp;</span> <span class="token number">2</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
      		<span class="token function">drop_slab</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
      		<span class="token function">count_vm_event</span><span class="token punctuation">(</span>DROP_SLAB<span class="token punctuation">)</span><span class="token punctuation">;</span>
      	<span class="token punctuation">}</span>
      	<span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token operator">!</span>stfu<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
      		<span class="token function">pr_info</span><span class="token punctuation">(</span><span class="token string">"%s (%d): drop_caches: %d\n"</span><span class="token punctuation">,</span>
      			current<span class="token operator">-&gt;</span>comm<span class="token punctuation">,</span> <span class="token function">task_pid_nr</span><span class="token punctuation">(</span>current<span class="token punctuation">)</span><span class="token punctuation">,</span>
      			sysctl_drop_caches<span class="token punctuation">)</span><span class="token punctuation">;</span>
      	<span class="token punctuation">}</span>
          <span class="token comment">//置位,否则就一直在回收了</span>
      	stfu <span class="token operator">|</span><span class="token operator">=</span> sysctl_drop_caches <span class="token operator">&amp;</span> <span class="token number">4</span><span class="token punctuation">;</span>
      <span class="token punctuation">}</span>
      <span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span>
      

      }

        可见,echo 1时,会调用drop_pagecache_sb去释放pagecache,我们继续往下查,

        drop_pagecache_sb ->
        	iput ->
        		iput_final->
        			write_inode_now -> #提交writeback_control,立即回写
        				writeback_single_inode ->
        					__writeback_single_inode ->
        						do_writepages #调用对应文件系统的writepage写回磁盘
        

          在BDI回写里,一开始提交的是wb_writeback_work,等到实际要执行回写操作时,都会转换为writeback_control,再去执行回写。

          因此,echo 1的操作就是,遍历每个超级块,调用drop_pagecache_sb,drop_pagecache_sb中会遍历该超级块所有的inode,对其关联的pagecache进行回写。与BDI不同的是,该操作是立马执行,不需要等待周期执行或者inode过期。

          3、释放slab cache

          而对于echo 2的情况,就比较复杂一点,

          static void drop_slab(void)
          {
          	int nr_objects;
          	struct shrink_control shrink = {
          		.gfp_mask = GFP_KERNEL,
          	};
              //上次回收缓存数量高于10,就再进行一次回收
              //这个条件其实挺苛刻的,回收后整个系统空闲slab不会超过10
          	do {
          		nr_objects = shrink_slab(&shrink, 1000, 1000);
          	} while (nr_objects > 10);
          }
          

          unsigned long shrink_slab(struct shrink_control shrink,
          unsigned long nr_pages_scanned,
          unsigned long lru_pages)
          {
          struct shrinker
          shrinker;
          unsigned long ret = 0;
          ...
          //遍历系统中所有的shrinker,回收各个slab管理区的空闲缓存
          list_for_each_entry(shrinker, &shrinker_list, list) {
          unsigned long long delta;
          long total_scan;
          long max_pass;
          int shrink_ret = 0;
          long nr;
          long new_nr;
          //获取批处理数量,默认每次回收128,对于超级块而言是1024
          long batch_size = shrinker->batch ? shrinker->batch
          : SHRINK_BATCH;
          //获取该slab管理区可回收的缓存数量
          max_pass = do_shrinker_shrink(shrinker, shrink, 0);
          if (max_pass <= 0)
          continue;

          	nr <span class="token operator">=</span> <span class="token function">atomic_long_xchg</span><span class="token punctuation">(</span><span class="token operator">&amp;</span>shrinker<span class="token operator">-&gt;</span>nr_in_batch<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
          
          	total_scan <span class="token operator">=</span> nr<span class="token punctuation">;</span>
              <span class="token comment">//计算该slab管理区此次缓存回收额度,一堆操作</span>
              <span class="token comment">//针对手动释放缓存的场景,基本上是两倍的max_pass,也就是尽可能去释放</span>
              <span class="token comment">//对于kswap或其他路径上,不会超过一倍的max_pass</span>
          	delta <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token number">4</span> <span class="token operator">*</span> nr_pages_scanned<span class="token punctuation">)</span> <span class="token operator">/</span> shrinker<span class="token operator">-&gt;</span>seeks<span class="token punctuation">;</span>
          	delta <span class="token operator">*</span><span class="token operator">=</span> max_pass<span class="token punctuation">;</span>
          	<span class="token function">do_div</span><span class="token punctuation">(</span>delta<span class="token punctuation">,</span> lru_pages <span class="token operator">+</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
          	total_scan <span class="token operator">+</span><span class="token operator">=</span> delta<span class="token punctuation">;</span>
          	<span class="token keyword">if</span> <span class="token punctuation">(</span>total_scan <span class="token operator">&lt;</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
          		<span class="token function">printk</span><span class="token punctuation">(</span>KERN_ERR <span class="token string">"shrink_slab: %pF negative objects to "</span>
          		       <span class="token string">"delete nr=%ld\n"</span><span class="token punctuation">,</span>
          		       shrinker<span class="token operator">-&gt;</span>shrink<span class="token punctuation">,</span> total_scan<span class="token punctuation">)</span><span class="token punctuation">;</span>
          		total_scan <span class="token operator">=</span> max_pass<span class="token punctuation">;</span>
          	<span class="token punctuation">}</span>
              <span class="token comment">//如果delta偏小,意味着系统中inactive的缓存偏少,我们回收的额度也不能设置太大</span>
          	<span class="token keyword">if</span> <span class="token punctuation">(</span>delta <span class="token operator">&lt;</span> max_pass <span class="token operator">/</span> <span class="token number">4</span><span class="token punctuation">)</span>
          		total_scan <span class="token operator">=</span> <span class="token function">min</span><span class="token punctuation">(</span>total_scan<span class="token punctuation">,</span> max_pass <span class="token operator">/</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
          
              <span class="token comment">//控制回收总额上限,避免死循环</span>
          	<span class="token keyword">if</span> <span class="token punctuation">(</span>total_scan <span class="token operator">&gt;</span> max_pass <span class="token operator">*</span> <span class="token number">2</span><span class="token punctuation">)</span>
          		total_scan <span class="token operator">=</span> max_pass <span class="token operator">*</span> <span class="token number">2</span><span class="token punctuation">;</span>
          
          	<span class="token function">trace_mm_shrink_slab_start</span><span class="token punctuation">(</span>shrinker<span class="token punctuation">,</span> shrink<span class="token punctuation">,</span> nr<span class="token punctuation">,</span>
          				nr_pages_scanned<span class="token punctuation">,</span> lru_pages<span class="token punctuation">,</span>
          				max_pass<span class="token punctuation">,</span> delta<span class="token punctuation">,</span> total_scan<span class="token punctuation">)</span><span class="token punctuation">;</span>
              <span class="token comment">//循环回收缓存</span>
          	<span class="token keyword">while</span> <span class="token punctuation">(</span>total_scan <span class="token operator">&gt;=</span> batch_size<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span>
          		<span class="token keyword">int</span> nr_before<span class="token punctuation">;</span>
                  <span class="token comment">//记录处理前缓存数量</span>
          		nr_before <span class="token operator">=</span> <span class="token function">do_shrinker_shrink</span><span class="token punctuation">(</span>shrinker<span class="token punctuation">,</span> shrink<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                  <span class="token comment">//回收后缓存数量</span>
          		shrink_ret <span class="token operator">=</span> <span class="token function">do_shrinker_shrink</span><span class="token punctuation">(</span>shrinker<span class="token punctuation">,</span> shrink<span class="token punctuation">,</span>
          						batch_size<span class="token punctuation">)</span><span class="token punctuation">;</span>
          		<span class="token keyword">if</span> <span class="token punctuation">(</span>shrink_ret <span class="token operator">==</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span>
          			<span class="token keyword">break</span><span class="token punctuation">;</span>
                  <span class="token comment">//统计此次回收的缓存数量</span>
          		<span class="token keyword">if</span> <span class="token punctuation">(</span>shrink_ret <span class="token operator">&lt;</span> nr_before<span class="token punctuation">)</span>
          			ret <span class="token operator">+</span><span class="token operator">=</span> nr_before <span class="token operator">-</span> shrink_ret<span class="token punctuation">;</span>
          		<span class="token function">count_vm_events</span><span class="token punctuation">(</span>SLABS_SCANNED<span class="token punctuation">,</span> batch_size<span class="token punctuation">)</span><span class="token punctuation">;</span>
                  <span class="token comment">//减少扫描总额</span>
          		total_scan <span class="token operator">-</span><span class="token operator">=</span> batch_size<span class="token punctuation">;</span>
          
          		<span class="token function">cond_resched</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
          	<span class="token punctuation">}</span>
              <span class="token comment">//如果剩下的额度不够一个batch_size,留着下次使用,记录在nr_in_batch</span>
          	<span class="token keyword">if</span> <span class="token punctuation">(</span>total_scan <span class="token operator">&gt;</span> <span class="token number">0</span><span class="token punctuation">)</span>
          		new_nr <span class="token operator">=</span> <span class="token function">atomic_long_add_return</span><span class="token punctuation">(</span>total_scan<span class="token punctuation">,</span>
          				<span class="token operator">&amp;</span>shrinker<span class="token operator">-&gt;</span>nr_in_batch<span class="token punctuation">)</span><span class="token punctuation">;</span>
          	<span class="token keyword">else</span>
          		new_nr <span class="token operator">=</span> <span class="token function">atomic_long_read</span><span class="token punctuation">(</span><span class="token operator">&amp;</span>shrinker<span class="token operator">-&gt;</span>nr_in_batch<span class="token punctuation">)</span><span class="token punctuation">;</span>
          
          	<span class="token function">trace_mm_shrink_slab_end</span><span class="token punctuation">(</span>shrinker<span class="token punctuation">,</span> shrink_ret<span class="token punctuation">,</span> nr<span class="token punctuation">,</span> new_nr<span class="token punctuation">)</span><span class="token punctuation">;</span>
          <span class="token punctuation">}</span>
          <span class="token function">up_read</span><span class="token punctuation">(</span><span class="token operator">&amp;</span>shrinker_rwsem<span class="token punctuation">)</span><span class="token punctuation">;</span>
          

          out:
          cond_resched();
          return ret;
          }

            空闲slab缓存计算和回收都是在do_shrinker_shrink完成,它其实调用的是一个函数指针,不同slab管理区有自己定义的shrink函数,第三个入参nr_to_scan为0时,是计算空闲slab缓存;不为空时,表示扫描和回收缓存的数量。

            static inline int do_shrinker_shrink(struct shrinker *shrinker,
            				     struct shrink_control *sc,
            				     unsigned long nr_to_scan)
            {
            	int objects;
            	sc->nr_to_scan = nr_to_scan;
            	objects = (*shrinker->shrink)(shrinker, sc);
            
            <span class="token keyword">if</span> <span class="token punctuation">(</span>objects <span class="token operator">&lt;</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span>
            	<span class="token keyword">return</span> INT_MAX<span class="token punctuation">;</span>
            
            <span class="token keyword">return</span> objects<span class="token punctuation">;</span>
            

            }

              总的来说,drop_slab就是调用每个slab管理区定义的shrink函数,先计算出可回收的slab缓存数量,然后确定扫描数量,最后调用shrink函数执行缓存扫描和回收。

              与[转帖]vm内核参数之缓存回收drop_caches相似的内容:

              [转帖]vm内核参数之缓存回收drop_caches

              注:本文分析基于3.10.0-693.el7内核版本,即CentOS 7.4 1、关于drop_caches 通常在内存不足时,我们习惯通过echo 3 > /proc/sys/vm/drop_caches 的方式手动清理系统缓存, [root@localhost ~]# free -m total

              [转帖]vm 缓存相关参数配置

              https://www.cnblogs.com/JennyYu/p/16664523.html 一、脏数据相关参数: 文件缓存是一项重要的性能改进,在大多数情况下,读缓存在绝大多数情况下是有益无害的(程序可以直接从RAM中读取数据)。写缓存比较复杂,Linux内核将磁盘写入缓存,过段时间再异步将它们

              [转帖]Linux kernel内存管理之overcommit相关参数

              前言 了解 linux kernel内存管理,首先可以从用户空间的角度来看kernel的内存管理,执行ls /proc/sys/vm的命令,就可以看到vm运行的所有参数,其中就包含了跟overcommit相关的参数。 Memory overcommit概念介绍 要了解这类参数首先要理解什么是comm

              [转帖]谨慎调整内核参数:vm.min_free_kbytes

              https://www.cnblogs.com/muahao/p/8082997.html 内核参数:内存相关 内存管理从三个层次管理内存,分别是node, zone ,page; 64位的x86物理机内存从高地址到低地址分为: Normal DMA32 DMA.随着地址降低。 [root@loca

              [转帖]vm overcommit参数

              https://www.cnblogs.com/ExMan/p/11586756.html overcommit参数需要根据不同服务来进行调整,使内存得到充分利用的同时保证系统的稳定性。比如redis服务器建议把vm.overcommit_memory设置为1. 1、vm.overcommit_ra

              [转帖]linux中 vm.overcommit_memory 的含义

              https://www.cnblogs.com/wshenjin/p/15500202.html vm.overcommit_memory 表示内核在分配内存时候做检查的方式。这个变量可以取到0,1,2三个值。对取不同的值时的处理方式都定义在内核源码 mm/mmap.c 的 __vm_enough_

              [转帖]Linux磁盘I/O(二):使用vm.dirty_ratio和vm.dirty_background_ratio优化磁盘性能

              文件缓存是一项重要的性能改进,在大多数情况下,读缓存在绝大多数情况下是有益无害的(程序可以直接从RAM中读取数据)。写缓存比较复杂,Linux内核将磁盘写入缓存,过段时间再异步将它们刷新到磁盘。这对加速磁盘I/O有很好的效果,但是当数据未写入磁盘时,丢失数据的可能性会增加。 当然,也存在缓存被写爆的

              [转帖]Linux的tmpfs和ramfs

              tmpfs tmpfs是一种虚拟内存文件系统, 它的存储空间在VM里面,现在大多数操作系统都采用了虚拟内存管理机制, VM(Virtual Memory) 是由Linux内核里面的VM子系统管理. VM的大小由RM(Real Memory)和swap组成, RM就是物理内存, swap是通过硬盘虚拟

              [转帖]JVM NativeMemoryTracking ;jcmd process_id VM.native_memory;Native memory tracking is not enabled

              目录 一、Native Memory Tracking (NMT) 是Hotspot VM用来分析VM内部内存使用情况的一个功能。我们可以利用jcmd(jdk自带)这个工具来访问NMT的数据。 1.Native memory tracking is not enabled 打开NMT 二、查看原生内

              [转帖]内存优化(开启内存大页vm.nr_hugepages)

              大页内存(hugepages) 为优化内存管理引入了hugepages 可以自定义设置、将原来标准内存也4k设置为更大。 hugepages 优点: 使得Oracle SGA 不可交换; 减轻 TLB 的压力; 减少页表的开销; 减少页表查询的开销; 提升内存访问的整体性能; oracle建议设置h