[转帖]TaiShan v110 - Microarchitectures - HiSilicon

taishan,v110,microarchitectures,hisilicon · 浏览次数 : 0

小编点评

好的,以下是关于生成内容的排版建议: 1. **首先**,需要使用简单的排版来描述芯片或 SoC的架构,包括数量、类型、设计等。 2. **接着**,可以使用简单的排版来描述每一种芯片的功能,包括支持的功能、规格等。 3. **最后**,可以使用简单的排版来描述每一种芯片的架构,包括设计特点、关键设计等。 排版建议可以根据具体需要进行调整,例如可以根据具体芯片的架构进行调整,也可以根据具体的功能需求进行调整等。

正文

 
https://en.wikichip.org/wiki/hisilicon/microarchitectures/taishan_v110

 


 

Edit Values
TaiShan v110 µarch
General Info
Arch Type CPU
Designer HiSilicon
Manufacturer TSMC
Introduction 2019
Process 7 nm
Core Configs 324864
Pipeline
Type Superscalar, Superpipeline
OoOE Yes
Speculative Yes
Reg Renaming Yes
Decode 4-way
Instructions
ISA ARMv8.2-A
Extensions NEON
Cache
L1I Cache 64 KiB/core
L1D Cache 64 KiB/core
L2 Cache 512 KiB/core
L3 Cache 1 MiB/core
Succession
 

TaiShan v110 is the successor to the TaiShan v100, a high-performance ARM server microarchitecture designed by HiSilicon for Huawei's own TaiShan servers.

Brands[edit]

TaiShan-based CPUs are branded as the Kunpeng 920 series.

Release Dates[edit]

Kunpeng 920 CPUs were officially launched in early 2019.

Architecture[edit]

 
Overview

Key changes from TaiShan v100[edit]

  • TSMC 7 nm HPC process (from 16 nm)
  • 2x core count (64, up from 32)
    • Custom cores (from Cortex-A72)
      • ASIMD
        • double SP Vector throughput (2 inst/cycle, up from 1)
  • Memory
    • 2x memory channels (8, up from 4)
  • I/O
    • PCIe Gen 4 (from Gen 3)

This list is incomplete; you can help by expanding it.

Block Diagram[edit]

Entire Chip[edit]

taishan v110 soc block diagram.svg

Memory Hierarchy[edit]

  • Cache
    • L1I Cache
      • 64 KiB/core, private
      • 64-byte cache lines
    • L1D Cache
      • 64 KiB/core, private
      • 64-byte cache lines
    • L2 Cache
      • 512 KiB/core, private
    • L3 Cache
      • 1 MiB/core
      • Shared by all cores
    • System DRAM
      • 1 TiB Max Memory / socket
      • 8 Channels
      • DDR4, up to 2933 MT/s
        • 1 DPC and 2 DPC support
      • 8 B/cycle/channel (@ memory clock)
      • ECC, SDDC, DDDC

Overview[edit]

 
Overview

Though HiSilicon has a history of designing Arm processors. The TaiShan v110 core is HiSilicons' first custom homegrown high-performance ARM core and SoC design. The chip, which incorporates multiple compute dies and an I/O is a multi-chip package, is fabricated on TSMC's 7-nanometers HPC process and integrates up to 64 cores and up to 64 MiB of last level cache.

The SoC also incorporates a number of hardware accelerators. There is a crypto engine that supports AES, DES/3DES, MD5, SHA1, SHA2, HMAC, CMAC with throughputs of up to 100 Gbit/s. Additionally, there is also a compression engine supporting GZIP, LZS, LZ4 with compression throughputs of up to 40 Gbit/s and decompression of up to 100 Gbit/s.

Marketed as the Kunpeng 920, this SoC supports up to 4-way multiprocessing support through HiSilicon's Hydra interface. In order to keep the cores fed, eight DDR4 memory channels are incorporated per socket. Additionally, designed to facilitate an easy accelerator platform, there are 40 PCIe Gen 4 lanes provided per socket with CCIX support, enabling cache coherency.

Core[edit]

Each core is a 4-way out-of-order superscalar that implements the ARMv8.2-A ISA. Huawei stated that the core supports almost all the ARMv8.4 features with a few exceptions, including dot product and the FP16 FML extension. It features private 64 KiB L1 instruction and data caches as well as 512 KiB of private L2. Though light on details, Huawei says that compared to Arm's Cortex cores, their core features an improved memory subsystem, a larger number of execution units, and a better branch predictor.

ASIMD[edit]

Each core features a single 128-bit NEON unit. It is capable of executing single double-precision FMA vector instruction per cycle or two single-precision vector instructions per cycle. Operating at 2 GHz, a 64-core chip will have a peak compute of 512 GigaFLOPS of double-precision floating point. It's worth noting that compared to the TaiShan v100, the throughput for single-precision vector has been doubled from 1 to 2 instructions per cycle.

MCP physical design[edit]

The SoC itself comprises 3 dies - two Super CPU Cluster (SCCL) compute dies and a Super IO Cluster (SICL). The SCCL compute dies contains 8 CPU Clusters (CCLs), memory controllers, and the L3 cache block. There are eight CCLs on each of the SICL dies for a total of 64 cores. The CCLs are TaiShan V110 quadplex along with the L3 cache tags partition. The Super IO Clusters include the various I/O peripherals including PCIe Gen 4, SAS, the network interface controllers, and the Hydra links.

taishan v110 soc details.svg

Scalability[edit]

See also: Hydra Interface

Each chip incorporates three Hydra interface ports. The Hydra interface facilitates the cache coherency between the dies on the chip. Every link supports 240 Gb/s (30 GB/s) of peak bandwidth for a total aggregated bandwidth of 720 Gb/s (90 GB/s) in a 2-way symmetric multiprocessing configuration.

Kunpeng 920 2smp.svg

With all three links, there is also support for 4-way SMP. In this configuration, one link from each socket is connected to another socket for an all-for-all connection.

 

Kunpeng 920 4smp.svg

Chipset[edit]

Along with the Hi1620 SoC, HiSilicon developed a number of integrated circuits as part of the chipset platform.

ChipDescription
Hi1620 CPU, Kunpeng 920 series Chip
Hi1503 CPU interconnect chip, supports scaling-up to 32 sockets
Hi1812 SSD storage controller, for read/write I/O acceleration
Hi1822 Network controller chip, DC high-speed flexible interconnect
Hi1710 BMC management chip + enhanced RAS features chip
hi1620 chipset.png

Die[edit]

  • TSMC 7 nm HPC
  • 20,000,000,000 transistors
    • 3-4 dies

All TaiShan v110 Chips[edit]

 List of TaiShan v110-based Processors
ModelLaunchedCoresArchFrequencyL3TDP
920-3226 26 April 2019 32 TaiShan v110 2.6 GHz 32 MiB 120 W
920-4826 26 April 2019 48 TaiShan v110 2.6 GHz 48 MiB 158 W
920-6426 7 January 2019 64 TaiShan v110 2.6 GHz 64 MiB 195 W
Count: 3

Bibliography[edit]

  • Huawei. Personal Communication. 2019
  • Huawei Connect 2018. October 2018
  • HiSilicon Event. January 7, 2019
  • Huawei, Supercomputing 2018
 
codename TaiShan v110 +
core count 32 +, 48 + and 64 +
designer HiSilicon +
first launched 2019 +
full page name hisilicon/microarchitectures/taishan v110 +
instance of microarchitecture +
instruction set architecture ARMv8.2-A +
manufacturer TSMC +
microarchitecture type CPU +
name TaiShan v110 +
process 7 nm (0.007 μm, 7.0e-6 mm) +

与[转帖]TaiShan v110 - Microarchitectures - HiSilicon相似的内容:

[转帖]TaiShan v110 - Microarchitectures - HiSilicon

https://en.wikichip.org/wiki/hisilicon/microarchitectures/taishan_v110 Edit Values TaiShan v110 µarch General Info Arch Type CPU Designer HiSilicon Ma

[转帖]Nginx应用调优案例

https://bbs.huaweicloud.com/blogs/146367 【摘要】 1 问题背景nginx的应用程序移植到TaiShan服务器上,发现业务吞吐量没有达到硬件预期,需要做相应调优。 2 原因分析l 网卡配置该应用场景下网络吞吐量大,网卡的配置能对性能提升起到很大的作用。l 操作

[转帖]Nginx应用调优案例

https://bbs.huaweicloud.com/blogs/146367 【摘要】 1 问题背景nginx的应用程序移植到TaiShan服务器上,发现业务吞吐量没有达到硬件预期,需要做相应调优。 2 原因分析l 网卡配置该应用场景下网络吞吐量大,网卡的配置能对性能提升起到很大的作用。l 操作

【转帖】linux 调优篇 :硬件调优(BIOS配置)* 壹

一. 设置内存刷新频率为Auto二. 开启NUMA三. 设置Stream Write Mode四. 开启CPU预取配置五. 开启SRIOV六. 开启SMMU 通过在BIOS中设置一些高级选项,可以有效提升虚拟化平台性能。表1列出了TaiShan服务器和性能相关的BIOS推荐配置项。 表1 BIOS性

[转帖]

Linux ubuntu20.04 网络配置(图文教程) 因为我是刚装好的最小系统,所以很多东西都没有,在开始配置之前需要做下准备 环境准备 系统:ubuntu20.04网卡:双网卡 网卡一:供连接互联网使用网卡二:供连接内网使用(看情况,如果一张网卡足够,没必要做第二张网卡) 工具: net-to

[转帖]

https://cloud.tencent.com/developer/article/2168105?areaSource=104001.13&traceId=zcVNsKTUApF9rNJSkcCbB 前言 Redis作为高性能的内存数据库,在大数据量的情况下也会遇到性能瓶颈,日常开发中只有时刻

[转帖]ISV 、OSV、 SIG 概念

ISV 、OSV、 SIG 概念 2022-10-14 12:29530原创大杂烩 本文链接:https://www.cndba.cn/dave/article/108699 1. ISV: Independent Software Vendors “独立软件开发商”,特指专门从事软件的开发、生产、

[转帖]Redis 7 参数 修改 说明

2022-06-16 14:491800原创Redis 本文链接:https://www.cndba.cn/dave/article/108066 在之前的博客我们介绍了Redis 7 的安装和配置,如下: Linux 7.8 平台 Redis 7 安装并配置开机自启动 操作手册https://ww

[转帖]HTTPS中间人攻击原理

https://www.zhihu.com/people/bei-ji-85/posts 背景 前一段时间,公司北京地区上线了一个HTTPS防火墙,用来监听HTTPS流量。防火墙上线之前,邮件通知给管理层,我从我老大那里听说这个事情的时候,说这个有风险,然后意外地发现,很多人原来都不知道HTTPS防

[转帖]关于字节序(大小端)的一点想法

https://www.zhihu.com/people/bei-ji-85/posts 今天在一个技术群里有人问起来了,当时有一些讨论(不完全都是我个人的观点),整理一下: 为什么网络字节序(多数情况下)是大端? 早年设备的缓存很小,先接收高字节能快速的判断报文信息:包长度(需要准备多大缓存)、地