[转帖]PostgreSQL 10.0 preview 功能增强 - 国际化功能增强,支持ICU(International Components for Unicode)

postgresql,preview,功能,增强,国际化,支持,icu,international,components,for,unicode · 浏览次数 : 0

小编点评

**ICU 支持** PostgreSQL 10.0 开始支持 ICU(International Components for Unicode),使软件可以保持跨平台一致性。 ICU 是一种成熟的,被广泛使用的跨平台一致性全球化支持库。 **ICU 的优势** * 与 Unicode 标准最为贴近 * 使用任何开源许可,可用于商业软件 * 支持 Unicode 转换、排序、日期格式、时区换算、规则表达式等功能 **ICU 的功能** * Text page conversion * Collation:根据语言设置进行字符串比较 * Formatting:格式化数字、日期、时间和货币金额 * Time calculations:提供多种日期和时间的计算方法 * Unicode 支持:提供对所有 Unicode 字符的支持 * Regular expressions:全面支持 Unicode 字符 * Bidi support:支持左右到左右文字混合数据 * Text boundaries:识别文本的边界位置 **ICU 的支持** * `pg_collation` 中的新字段 `collprovider` 用于存储 ICU 库的提供者。 * `pg_locale_t` 类型新增了一个 `collversion` 字段,记录 ICU 版本。 * `initdb` 在创建默认的 `collate_test1` 表时自动加载 ICU 库。 **示例** ```sql CREATE TABLE collate_test1 ( a INT, b TEXT COLLATE "en-x-icu" NOT NULL ); ```

正文

https://developer.aliyun.com/article/72935

 

标签

PostgreSQL , 10.0 , International Components for Unicode , ICU , collate , 国际化


背景

ICU是一个成熟的,被广泛使用的跨平台一致性全球化支持库。使用没有任何限制的开源许可,可以被商业、开源软件随意使用。

ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications.   
ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.  
  
ICU is released under a nonrestrictive open source license that is suitable for use with both commercial software and with other open source or free software.  

ICU的好处是与UNICODE标准最为贴近,而且可以使用ICU,软件可以做到跨平台保持一致性(只要是在ICU支持的平台中)。

ICU支持的功能如下,包括unicode和文本的转换,本土化的排序、时间日期格式支持,时区换算,规则表达式的unicode支持,等等。

Code Page Conversion: Convert text data to or from Unicode and nearly any other character set or encoding. ICU's conversion tables are based on charset data collected by IBM over the course of many decades, and is the most complete available anywhere.  
  
Collation: Compare strings according to the conventions and standards of a particular language, region or country. ICU's collation is based on the Unicode Collation Algorithm plus locale-specific comparison rules from the Common Locale Data Repository, a comprehensive source for this type of data.  
  
Formatting: Format numbers, dates, times and currency amounts according the conventions of a chosen locale. This includes translating month and day names into the selected language, choosing appropriate abbreviations, ordering fields correctly, etc. This data also comes from the Common Locale Data Repository.  
  
Time Calculations: Multiple types of calendars are provided beyond the traditional Gregorian calendar. A thorough set of timezone calculation APIs are provided.  
  
Unicode Support: ICU closely tracks the Unicode standard, providing easy access to all of the many Unicode character properties, Unicode Normalization, Case Folding and other fundamental operations as specified by the Unicode Standard.  
  
Regular Expression: ICU's regular expressions fully support Unicode while providing very competitive performance.  
  
Bidi: support for handling text containing a mixture of left to right (English) and right to left (Arabic or Hebrew) data.  
  
Text Boundaries: Locate the positions of words, sentences, paragraphs within a range of text, or identify locations that would be suitable for line wrapping when displaying the text.  

PostgreSQL 以前的全球化是通过glibc库来支持,受到glibc版本的影响,在更换平台时,可能影响排序或者本土化的结果。(例如windows, linux, freebsd等跨平台使用时)。

10.0开始,支持ICU了,在安装PG软件的机器上安装好ICU库,同时在configure时打开--with-icu,就可以使用ICU4C了。

pg_collation新增了一个字段collprovider表示libc或者icu. 增加一个collversion字段,记录当时使用的ICU版本,run time时检查,确保版本一致。

ICU support  
  
Add a column collprovider to pg_collation that determines which library  
provides the collation data.  The existing choices are default and libc,  
and this adds an icu choice, which uses the ICU4C library.  
  
The pg_locale_t type is changed to a union that contains the  
provider-specific locale handles.  Users of locale information are  
changed to look into that struct for the appropriate handle to use.  
  
Also add a collversion column that records the version of the collation  
when it is created, and check at run time whether it is still the same.  
This detects potentially incompatible library upgrades that can corrupt  
indexes and other structures.  This is currently only supported by  
ICU-provided collations.  
  
initdb initializes the default collation set as before from the   
`locale-a` output but also adds all available ICU locales with a "-x-icu"  
appended.  
  
Currently, ICU-provided collations can only be explicitly named  
collations.  The global database locales are still always libc-provided.  
  
ICU support is enabled by configure --with-icu.  
  
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>  
Reviewed-by: Andreas Karlsson <andreas@proxel.se>  

例子

  11 CREATE TABLE collate_test1 (  
  12     a int,  
  13     b text COLLATE "en-x-icu" NOT NULL  
  14 );  
  15   
  16 \d collate_test1  
  17   
  18 CREATE TABLE collate_test_fail (  
  19     a int,  
  20     b text COLLATE "ja_JP.eucjp-x-icu"  
  21 );  
  22   
  23 CREATE TABLE collate_test_fail (  
  24     a int,  
  25     b text COLLATE "foo-x-icu"  
  26 );  
  27   
  28 CREATE TABLE collate_test_fail (  
  29     a int COLLATE "en-x-icu",  
  30     b text  
  31 );  
  32   
  33 CREATE TABLE collate_test_like (  
  34     LIKE collate_test1  
  35 );  
  36   
  
  92 -- constant expression folding  
  93 SELECT 'bbc' COLLATE "en-x-icu" > 'äbc' COLLATE "en-x-icu" AS "true";  
  94 SELECT 'bbc' COLLATE "sv-x-icu" > 'äbc' COLLATE "sv-x-icu" AS "false";  
  95   
  96 -- upper/lower  
  97   
  98 CREATE TABLE collate_test10 (  
  99     a int,  
 100     x text COLLATE "en-x-icu",  
 101     y text COLLATE "tr-x-icu"  
 102 );  

这个patch的讨论,详见邮件组,本文末尾URL。

PostgreSQL社区的作风非常严谨,一个patch可能在邮件组中讨论几个月甚至几年,根据大家的意见反复的修正,patch合并到master已经非常成熟,所以PostgreSQL的稳定性也是远近闻名的。

参考

https://wiki.postgresql.org/wiki/Todo:ICU

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=eccfef81e1f73ee41f1d8bfe4fa4e80576945048

http://site.icu-project.org/

与[转帖]PostgreSQL 10.0 preview 功能增强 - 国际化功能增强,支持ICU(International Components for Unicode)相似的内容:

[转帖]PostgreSQL 10.0 preview 功能增强 - 国际化功能增强,支持ICU(International Components for Unicode)

https://developer.aliyun.com/article/72935 标签 PostgreSQL , 10.0 , International Components for Unicode , ICU , collate , 国际化 背景 ICU是一个成熟的,被广泛使用的跨平台一致性

[转帖]postgresql 表和索引的膨胀简析

postgresql 表和索引的膨胀是非常常见的,一方面是因为 autovacuum 清理标记为 dead tuple 的速度跟不上,另一方面也可能是由于长事物,未决事物,复制槽引起的。 #初始化数据 zabbix=# create table tmp_t0(c0 varchar(100),c1 v

[转帖]PostgreSQL 参数优化设置 32GB内存(推荐) 内存参数 检查点 日志参数 自动初始化参数shell脚本

1.修改参数列表 (1)执行计划 enable_nestloop = off #默认为on enable_seqscan = off #默认为on enable_indexscan = on enable_bitmapscan = on max_connections = 1000 #默认为100

[转帖]initdb 简介

https://www.cnblogs.com/ctypyb2002/p/9793041.html 写总结前 要看完整呢 不能太随意. os:centos 6.8postgresql:10.3 查看initdb的参数 $ /usr/pgsql-10/bin/initdb --help initdb

[转帖]IvorySQL

https://www.modb.pro/wiki/2713 简介:IvorySQL 是先进的、功能齐全的开源 兼容 Oracle的PostgreSQL数据库,并坚定地承诺始终保持 100% 兼容并直接替换最新的 PostgreSQL。IvorySQL 添加了一个“compatible_db”切换开

[转帖]源码安装postgresql13+uuid-ossp+pg_pathman

https://www.jianshu.com/p/5331ad46861a 系统版本:Centos7.8 数据库版本:postgresql-13.6 pg_pathman版本:1.5.12 所有文件下载地址:链接:https://pan.baidu.com/s/1XvMk_q4WCtb0rImTq

[转帖]在麒麟linux上安装Postgresql12.5

https://jimolonely.github.io/tech/linux/install-postgresql-kylin/ 本文主要实践在麒麟V10版本上通过源码编译安装PostgreSQL12.5,因为是源码编译,所以对于其他版本也具有参考性。 麒麟版本 V10 $ uname -a Li

[转帖]centos7离线安装postgresql13

https://www.cnblogs.com/summer-88/p/15341918.html 在一台可以联网的centos上安装postgresql源 yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/E

[转帖]在麒麟Linux安装Postgis

https://jimolonely.github.io/tech/linux/install-postgis-kylin/ 接着上一篇在麒麟linux上安装Postgresql12.5 ,我们来安装 PostGIS插件。 方案 因为 PostgreSQL不是通过 rpm包安装的,所以即便 Post

[转帖]postgresql 编译选项 --with-uuid=e2fs、--with-uuid=ossp 的理解

postgresql 的 rpm 包使用的是 ‘–with-uuid=e2fs’ postgresql 源码 configure 的帮助选项有 uuid 的几个选项,有啥区别? # ./configure --help --with-uuid=LIB build contrib/uuid-ossp