对于 AWS Cloudwatch, 主要在于 3 种不同的认证方式:
现在推荐的是使用 IAM Role 的认证方式,避免了密钥泄露的风险。
但是特别要注意的是,要读取 CloudWatch 指标和 EC2 标签 (tags)、实例、区域和告警,你必须通过 IAM 授予 Grafana 权限。你可以将这些权限附加到你在 AWS 认证中配置的 IAM role 或 IAM 用户。
IAM policy 示例如下:
Metrics-only:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowReadingMetricsFromCloudWatch",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarmsForMetric",
"cloudwatch:DescribeAlarmHistory",
"cloudwatch:DescribeAlarms",
"cloudwatch:ListMetrics",
"cloudwatch:GetMetricData",
"cloudwatch:GetInsightRuleReport"
],
"Resource": "*"
},
{
"Sid": "AllowReadingTagsInstancesRegionsFromEC2",
"Effect": "Allow",
"Action": ["ec2:DescribeTags", "ec2:DescribeInstances", "ec2:DescribeRegions"],
"Resource": "*"
},
{
"Sid": "AllowReadingResourcesForTags",
"Effect": "Allow",
"Action": "tag:GetResources",
"Resource": "*"
}
]
}
Logs-only:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowReadingLogsFromCloudWatch",
"Effect": "Allow",
"Action": [
"logs:DescribeLogGroups",
"logs:GetLogGroupFields",
"logs:StartQuery",
"logs:StopQuery",
"logs:GetQueryResults",
"logs:GetLogEvents"
],
"Resource": "*"
},
{
"Sid": "AllowReadingTagsInstancesRegionsFromEC2",
"Effect": "Allow",
"Action": ["ec2:DescribeTags", "ec2:DescribeInstances", "ec2:DescribeRegions"],
"Resource": "*"
},
{
"Sid": "AllowReadingResourcesForTags",
"Effect": "Allow",
"Action": "tag:GetResources",
"Resource": "*"
}
]
}
Metrics and Logs:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowReadingMetricsFromCloudWatch",
"Effect": "Allow",
"Action": [
"cloudwatch:DescribeAlarmsForMetric",
"cloudwatch:DescribeAlarmHistory",
"cloudwatch:DescribeAlarms",
"cloudwatch:ListMetrics",
"cloudwatch:GetMetricData",
"cloudwatch:GetInsightRuleReport"
],
"Resource": "*"
},
{
"Sid": "AllowReadingLogsFromCloudWatch",
"Effect": "Allow",
"Action": [
"logs:DescribeLogGroups",
"logs:GetLogGroupFields",
"logs:StartQuery",
"logs:StopQuery",
"logs:GetQueryResults",
"logs:GetLogEvents"
],
"Resource": "*"
},
{
"Sid": "AllowReadingTagsInstancesRegionsFromEC2",
"Effect": "Allow",
"Action": ["ec2:DescribeTags", "ec2:DescribeInstances", "ec2:DescribeRegions"],
"Resource": "*"
},
{
"Sid": "AllowReadingResourcesForTags",
"Effect": "Allow",
"Action": "tag:GetResources",
"Resource": "*"
}
]
}
跨账号可观测性:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": ["oam:ListSinks", "oam:ListAttachedLinks"],
"Effect": "Allow",
"Resource": "*"
}
]
}
几种认证方式的 AWS CLoudwatch 配置示例如下:
AWS SDK(default):
apiVersion: 1
datasources:
- name: CloudWatch
type: cloudwatch
jsonData:
authType: default
defaultRegion: eu-west-2
使用 Credentials 配置文件:
apiVersion: 1
datasources:
- name: CloudWatch
type: cloudwatch
jsonData:
authType: credentials
defaultRegion: eu-west-2
customMetricsNamespaces: 'CWAgent,CustomNameSpace'
profile: secondary
使用 AK&SK:
apiVersion: 1
datasources:
- name: CloudWatch
type: cloudwatch
jsonData:
authType: keys
defaultRegion: eu-west-2
secureJsonData:
accessKey: '<your access key>'
secretKey: '<your secret key>'
使用 AWS SDK Default 和 IAM Role 的 ARM 来 Assume:
apiVersion: 1
datasources:
- name: CloudWatch
type: cloudwatch
jsonData:
authType: default
assumeRoleArn: arn:aws:iam::123456789012:root
defaultRegion: eu-west-2
Cloudwatch 自带的几个仪表板都不太好用,建议使用 monitoringartist/grafana-aws-cloudwatch-dashboards 替代。
告警需要返回 numeric 数据的查询,而 CloudWatch Logs 支持这种查询。例如,你可以通过使用 stats
命令来启用告警。
这也是一个有效的查询,用于对包括文本 "Exception" 的消息发出告警:
filter @message like /Exception/
| stats count(*) as exceptionCount by bin(1h)
| sort exceptionCount desc
CloudWatch 插件使您能够跨区域账户监控应用程序并排除故障。利用跨账户的可观察性,你可以无缝地搜索、可视化和分析指标和日志,而不必担心账户的界限。
要使用这个功能,请在 AWS 控制台的 Cloudwatch 设置下,配置一个 monitoring 和 source 账户,然后按照上文所述添加必要的 IAM 权限。