Transparent Huge Pages (THP) are enabled by default in RHEL 6 for all applications. The kernel attempts to allocate hugepages whenever possible and any Linux process will receive 2MB pages if the mmap region is 2MB naturally aligned. The main kernel address space itself is mapped with hugepages, reducing TLB pressure from kernel code. For general information on Hugepages, see: What are Huge Pages and what are the advantages of using them?
The kernel will always attempt to satisfy a memory allocation using hugepages. If no hugepages are available (due to non availability of physically continuous memory for example) the kernel will fall back to the regular 4KB pages. THP are also swappable (unlike hugetlbfs). This is achieved by breaking the huge page to smaller 4KB pages, which are then swapped out normally.
But to use hugepages effectively, the kernel must find physically continuous areas of memory big enough to satisfy the request, and also properly aligned. For this, a khugepaged kernel thread has been added. This thread will occasionally attempt to substitute smaller pages being used currently with a hugepage allocation, thus maximizing THP usage.
In userland, no modifications to the applications are necessary (hence transparent). But there are ways to optimize its use. For applications that want to use hugepages, use of posix_memalign() can also help ensure that large allocations are aligned to huge page (2MB) boundaries.
Also, THP is only enabled for anonymous memory regions. There are plans to add support for tmpfs and page cache. THP tunables are found in the /sys tree under /sys/kernel/mm/redhat_transparent_hugepage.
一般而言,内存管理的最小块级单位叫做page,一个page是4096bytes,1M的内存会有256个page,1GB的话就会有256,000个page。CPU通过内置的内存管理单元维护着page表记录。
正常来说,有两种方式来增加内存可以管理的内存大小:
1.增大硬件内存管理单元的大小。
2.增大page的大小。
第一个方法不是很现实,现代的硬件内存管理单元最多只支持数百到上千的page表记录,并且,对于数百万page表记录的维护算法必将与目前的数百条记录的维护算法大不相同才能保证性能,目前的解决办法是,如果一个程序所需内存page数量超过了内存管理单元的处理大小,操作系统会采用软件管理的内存管理单元,但这会使程序运行的速度变慢。
从redhat 6(centos,sl,ol)开始,操作系统开始支持 Huge Pages,也就是大页。
简单来说, Huge Pages就是大小为2M到1GB的内存page,主要用于管理数千兆的内存,比如1GB的page对于1TB的内存来说是相对比较合适的。
THP(Transparent Huge Pages)是一个使管理Huge Pages自动化的抽象层。
目前需要注意的是,由于实现方式问题,THP会造成内存锁影响性能,尤其是在程序不是专门为大内内存页开发的时候,简单介绍如下:
操作系统后台有一个叫做khugepaged的进程,它会一直扫描所有进程占用的内存,在可能的情况下会把4kpage交换为Huge Pages,在这个过程中,对于操作的内存的各种分配活动都需要各种内存锁,直接影响程序的内存访问性能,并且,这个过程对于应用是透明的,在应用层面不可控制,对于专门为4k page优化的程序来说,可能会造成随机的性能下降现象。
小知识点:
1:从RedHat 6, OEL 6, SLES 11 and UEK2 kernels 开始,系统缺省会启用 Transparent HugePages :用来提高内存管理的性能透明大页(Transparent HugePages )和之前版本中的大页功能上类似。主要的区别是:Transparent HugePages 可以实时配置,不需要重启才能生效配置;
2:Transparent Huge Pages在32位的RHEL 6中是不支持的。
Transparent Huge Pages are not available on the 32-bit version of RHEL 6.
3: ORACLE官方不建议我们使用RedHat 6, OEL 6, SLES 11 and UEK2 kernels 时的开启透明大页(Transparent HugePages ), 因为透明大页(Transparent HugePages ) 存在一些问题:
1.在RAC环境下 透明大页(Transparent HugePages )会导致异常节点重启,和性能问题;
2.在单机环境中,透明大页(Transparent HugePages ) 也会导致一些异常的性能问题;
Transparent HugePages memory is enabled by default with Red Hat Enterprise Linux 6, SUSE Linux Enterprise Server 11, and Oracle Linux 6 with earlier releases of Oracle Linux Unbreakable Enterprise Kernel 2 (UEK2) kernels. Transparent HugePages memory is disabled in later releases of Oracle Linux UEK2 kernels.Transparent HugePages can cause memory allocation delays during runtime. To avoid performance issues, Oracle recommends that you disable Transparent HugePages on all Oracle Database servers. Oracle recommends that you instead use standard HugePages for enhanced performance.Transparent HugePages memory differs from standard HugePages memory because the kernel khugepaged thread allocates memory dynamically during runtime. Standard HugePages memory is pre-allocated at startup, and does not change during runtime.
Starting with RedHat 6, OEL 6, SLES 11 and UEK2 kernels, Transparent HugePages are implemented and enabled (default) in an attempt to improve the memory management. Transparent HugePages are similar to the HugePages that have been available in previous Linux releases. The main difference is that the Transparent HugePages are set up dynamically at run time by the khugepaged thread in kernel while the regular HugePages had to be preallocated at the boot up time. Because Transparent HugePages are known to cause unexpected node reboots and performance problems with RAC, Oracle strongly advises to disable the use of Transparent HugePages. In addition, Transparent Hugepages may cause problems even in a single-instance database environment with unexpected performance problems or delays. As such, Oracle recommends disabling Transparent HugePages on all Database servers running Oracle.
4:安装Vertica Analytic Database时也必须关闭透明大页功能。
RHEL6优化了内存申请的效率,而且在某些场景下对KVM的性能有明显提升:http://www.linux-kvm.org/wiki/images/9/9e/2010-forum-thp.pdf。
Hadoop是个高密集型内存运算系统,这个改动似乎给它带来了副作用。理论上运算型Java程序应该更多的使用用户态CPU才对,Cloudera官方也推荐关闭THP。于是参考一些文章作了调整:
http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/
效果很明显,大概12:05分的时候操作的,系统态占用基本消失了。文件Cache使用上升、机器负载下降。
除了手动修改运行时参数之外,还可以修改 /etc/grub.conf 里内核的启动参数,追加“transparent_hugepage=never”(此选项只对 /sys/kernel/mm/redhat_transparent_hugepage/enabled 有效)。