[转帖]可观测|流量|日志|监控|链路|用户体验

观测,流量,日志,监控,链路,用户,体验 · 浏览次数 : 0

小编点评

**Generating Content with Observability** **Introduction** Observability is a vital property of an application and its supporting infrastructure. It allows organizations to gain deep insights into their environments, identify potential issues, and proactively improve performance. **The Three Pillars of Observability** 1. Logs 2. Metrics 3. Distributed Tracing **Extending Observability Capabilities** Open-source solutions, such as OpenTelemetry, provide a de facto standard for collecting telemetry data in cloud settings. Real user monitoring allows organizations to gain real-time visibility into the user experience. **Benefits of Observability** - Improved end-user experience - Enhanced business outcomes - Proactive troubleshooting of issues - Reduced recovery time from failures - Better understanding of user experience **Generating Observability Content** - Logs: Collect structured or unstructured text records of discreet events. - Metrics: Values represented as counts or measures that are often calculated or aggregated over a period of time. - Distributed Tracing: Displays activity of a transaction or request as it flows through applications. **Conclusion** Observability is a critical property that enables organizations to gain deep insights into their environments and make informed improvements. By extending the three pillars of observability, organizations can enhance their understanding of system health and proactively address issues.

正文

https://cloud.tencent.com/developer/article/2019401?areaSource=105001.8&traceId=ySuPzDMCMO5dVSJSYsUT9

 

What is observability?

In IT and cloud computing, observability is the ability to measure a system’s current state based on the data it generates, such as logs, metrics, and traces.

在 IT 和云计算中,可观察性是根据系统生成的数据(如日志、指标和跟踪)来衡量系统当前状态的能力。

Observability relies on telemetry derived from instrumentation that comes from the endpoints and services in your multi-cloud computing environments. In these modern environments, every hardware, software, and cloud infrastructure component and every container, open-source tool, and microservice generates records of every activity. The goal of observability is to understand what’s happening across all these environments and among the technologies, so you can detect and resolve issues to keep your systems efficient and reliable and your customers happy.

可观察性依赖于从多云计算环境中的终结点和服务的检测派生的遥测数据。在这些现代环境中,每个硬件、软件和云基础架构组件以及每个容器、开源工具和微服务都会生成每个活动的记录。可观察性的目标是了解所有这些环境和技术中发生的情况,以便您可以检测和解决问题,以保持系统的高效性和可靠性以及客户的满意。

Organizations usually implement observability using a combination of instrumentation methods including open-source instrumentation tools, such as OpenTelemetry.

组织通常使用检测方法的组合来实现可观察性,包括开源检测工具,如 OpenTelemetry。

Many organizations also adopt an observability solution to help them detect and analyze the significance of events to their operations, software development life cycles, application security, and end-user experiences.

许多组织还采用可观察性解决方案来帮助他们检测和分析事件对其运营、软件开发生命周期、应用程序安全性和最终用户体验的重要性。

Observability has become more critical in recent years, as cloud-native environments have gotten more complex and the potential root causes for a failure or anomaly have become more difficult to pinpoint. As teams begin collecting and working with observability data, they are also realizing its benefits to the business, not just IT.

近年来,随着云原生环境变得越来越复杂,故障或异常的潜在根本原因变得更加难以确定,可观察性变得越来越重要。随着团队开始收集和处理可观察性数据,他们也意识到它对业务的好处,而不仅仅是IT。

Because cloud services rely on a uniquely distributed and dynamic architecture, observability may also sometimes refer to the specific software tools and practices businesses use to interpret cloud performance data. Although some people may think of observability as a buzzword for sophisticated application performance monitoring (APM), there are a few key distinctions to keep in mind when comparing observability and monitoring.

由于云服务依赖于独特的分布式动态架构,因此可观察性有时也可能是指企业用于解释云性能数据的特定软件工具和实践。尽管有些人可能认为可观察性是复杂应用程序性能监视 (APM) 的流行语,但在比较可观察性和监视时,需要记住一些关键区别。

What is the difference between monitoring and observability?

Is observability really monitoring by another name? In short, no. While observability and monitoring are related — and can complement one another — they are actually different concepts.

可观察性真的是用另一个名字来监视吗?简而言之,没有。虽然可观察性和监测是相关的,并且可以相互补充,但它们实际上是不同的概念。

In a monitoring scenario, you typically preconfigure dashboards that are meant to alert you to performance issues you expect to see later. However, these dashboards rely on the key assumption that you’re able to predict what kinds of problems you’ll encounter before they occur.

在监视方案中,通常预配置仪表板,这些仪表板旨在提醒您以后会看到的性能问题。但是,这些仪表板依赖于一个关键假设,即您能够在问题发生之前预测将遇到的问题类型。

Cloud-native environments don’t lend themselves well to this type of monitoring because they are dynamic and complex, which means you have no way of knowing in advance what kinds of problems might arise.

云原生环境不适合这种类型的监视,因为它们是动态和复杂的,这意味着您无法提前知道可能会出现哪种问题。

In an observability scenario, where an environment has been fully instrumented to provide complete observability data, you can flexibly explore what’s going on and quickly figure out the root cause of issues you may not have been able to anticipate.

在可观察性方案中,环境已完全检测以提供完整的可观察性数据,您可以灵活地探索正在发生的事情,并快速找出可能无法预期的问题的根本原因。

Monitoring vs. Observability

MONITORING

OBSERVABILITY

Tracks metrics and logs. The focus is on gathering metrics and log data, with alerts when set thresholds are exceeded.跟踪指标和日志。重点是收集指标和日志数据,并在超过设置的阈值时发出警报。

Delivers actionable information. Intelligence is applied to telemetry data, producing actionable feedback loops and enabling automated changes and optimizations to infrastructure and application runtime deployment.提供可操作的信息。智能应用于遥测数据,生成可操作的反馈循环,并支持对基础结构和应用程序运行时部署进行自动更改和优化。

Collects data. Infrastructure monitoring collects valuable metrics such as CPU, memory, response time, error rates, and latency.收集数据。基础结构监视收集有价值的指标,如 CPU、内存、响应时间、错误率和延迟。

Correlates metrics. Observability brings together metrics from disparate systems, identifying specific problems, so you can quickly understand how they relate to one another.关联指标。可观察性将来自不同系统的指标汇集在一起,识别特定问题,因此您可以快速了解它们之间的相互关系。

Watches defined systems. Monitoring keeps track of the health of important systems.监视定义的系统。监视可跟踪重要系统的运行状况。

Interprets data from complex systems. Observability allows for granular insights and debugging, enabling teams to correct problems as they happen。解释来自复杂系统的数据。可观察性允许进行细粒度的见解和调试,使团队能够在问题发生时纠正问题。

Why is observability important?

In enterprise environments, observability helps cross-functional teams understand and answer specific questions about what’s happening in highly distributed systems. Observability enables you to understand what is slow or broken and what needs to be done to improve performance. With an observability solution in place, teams can receive alerts about issues and pro-actively resolve them before they impact users.

在企业环境中,可观察性可帮助跨职能团队理解和回答有关高度分布式系统中发生的情况的特定问题。可观察性使您能够了解哪些内容缓慢或损坏,以及需要采取哪些措施来提高性能。借助可观察性解决方案,团队可以接收有关问题的警报,并在问题影响用户之前主动解决问题。

Because modern cloud environments are dynamic and constantly changing in scale and complexity, most problems are neither known nor monitored. Observability addresses this common issue of “unknown unknowns,” enabling you to continuously and automatically understand new types of problems as they arise.

由于现代云环境是动态的,并且规模和复杂性不断变化,因此大多数问题既不为人所知,也不被监控。可观察性解决了“未知未知数”这一常见问题,使您能够在出现新类型问题时持续自动地了解它们。

Observability is also a critical capability of artificial intelligence for IT operations (AIOps). As more organizations adopt cloud-native architectures, they are also looking for ways to implement AIOps, harnessing AI as a way to automate more processes throughout the DevSecOps life cycle. By bringing AI to everything — from gathering telemetry to analyzing what’s happening across the full technology stack — your organization can have the reliable answers essential for automating application monitoring, testing, continuous delivery, application security, and incident response.

可观察性也是人工智能在 IT 运营 (AIOps) 中的关键功能。随着越来越多的组织采用云原生架构,他们也在寻找实现AIOps的方法,利用AI作为在整个DevSecOps生命周期中自动化更多流程的一种方式。通过将 AI 引入一切(从收集遥测数据到分析整个技术堆栈中发生的情况),您的组织可以获得自动化应用程序监视、测试、持续交付、应用程序安全性和事件响应所必需的可靠答案。

The value of observability doesn’t stop at IT use cases. Once you begin collecting and analyzing observability data, you have an invaluable window into the business impact of your digital services. This visibility enables you to optimize conversions, validate that software releases meet business goals, measure the outcomes of your user experience SLOs, and prioritize business decisions based on what matters most.

可观察性的价值并不止于 IT 用例。一旦您开始收集和分析可观察性数据,您就可以了解数字服务的业务影响。这种可见性使您能够优化转化,验证软件版本是否符合业务目标,衡量用户体验 SLO 的结果,并根据最重要的内容确定业务决策的优先级。

When an observability solution also analyzes user experience data using synthetic and real-user monitoring, you can discover problems before your users do and design better user experiences based on real, immediate feedback.

当可观察性解决方案还使用综合和真实用户监视来分析用户体验数据时,您可以在用户之前发现问题,并根据真实、即时的反馈设计更好的用户体验。

Benefits of observability

Observability delivers powerful benefits to IT teams, organizations, and end-users alike. Here are some of the use cases observability facilitates:

  1. Application performance monitoring: Full end-to-end observability enables organizations to get to the bottom of application performance issues much faster, including issues that arise from cloud-native and microservices environments. An advanced observability solution can also be used to automate more processes, increasing efficiency and innovation among Ops and Apps teams.
  2. DevSecOps and SRE: Observability is not just the result of implementing advanced tools, but a foundational property of an application and its supporting infrastructure. The architects and developers who create the software must design it to be observed. Then DevSecOps and SRE teams can leverage and interpret the observable data during the software delivery life cycle to build better, more secure, more resilient applications.
  3. Infrastructure, cloud, and Kubernetes monitoring: Infrastructure and operations (I&O) teams can leverage the enhanced context an observability solution offers to improve application uptime and performance, cut down the time required to pinpoint and resolve issues, detect cloud latency issues and optimize cloud resource utilization, and improve administration of their Kubernetes environments and modern cloud architectures.
  4. End-user experience: A good user experience can enhance a company’s reputation and increase revenue, delivering an enviable edge over the competition. By spotting and resolving issues well before the end-user notices and making an improvement before it’s even requested, an organization can boost customer satisfaction and retention. It’s also possible to optimize the user experience through real-time playback, gaining a window directly into the end-user’s experience exactly as they see it, so everyone can quickly agree on where to make improvements.
  5. Business analytics: Organizations can combine business context with full stack application analytics and performance to understand real-time business impact, improve conversion optimization, ensure that software releases meet expected business goals, and confirm that the organization is adhering to internal and external SLAs.

DevSecOps teams can tap observability to get more insights into the apps they develop, and automate testing and CI/CD processes so they can release better quality code faster. This means organizations waste less time on war rooms and finger-pointing. Not only is this a benefit from a productivity standpoint, but it also strengthens the positive working relationships that are essential for effective collaboration.

These organizational improvements open the door to further innovation and digital transformation. And, more importantly, the end-user ultimately benefits in the form of a high-quality user experience.

How do you make a system observable?

If you’ve read about observability, you likely know that collecting the measurements of logs, metrics, and distributed traces are the three key pillars to achieving success. However, observing raw telemetry from back-end applications alone does not provide the full picture of how your systems are behaving.

Neglecting the front-end perspective potentially skews or even misrepresents the full picture of how your applications and infrastructure are performing in the real world for real users. Extending the three-pillars approach, IT teams must augment telemetry collection with user-experience data to eliminate blind spots:

  1. Logs: These are structured or unstructured text records of discreet events that occurred at a specific time.
  2. Metrics: These are the values represented as counts or measures that are often calculated or aggregated over a period of time. Metrics can originate from a variety of sources, including infrastructure, hosts, services, cloud platforms, and external sources.
  3. Distributed tracing: This displays activity of a transaction or request as it flows through applications and shows how services connect, including code-level details.
  4. User experience: This extends traditional observability telemetry by adding the outside-in user perspective of a specific digital experience on an application, even in pre-production environments.

Why the three pillars of observability aren’t enough

Obviously, data collection is only the start. Simply having access to the right logs, metrics, and traces isn’t enough to gain true observability into your environment. Once you’re able to use that telemetry data to achieve the end goals of improving end-user experience and business outcomes, only then can you really say you’ve achieved the purpose of observability.

There are other observability capabilities organizations can use to observe their environments. Open-source solutions, such as OpenTelemetry, provide a de facto standard for collecting telemetry data in cloud settings. These open-source solutions enhance observability for cloud-native applications and make it easier for developers and operations teams to achieve a consistent understanding of application health across multiple environments.

Organizations can also use real user monitoring to gain real-time visibility into the user experience, tracking the path of a single request and gaining insight into every interaction it has with every service along the way. This experience can be observed by synthetic monitoring or even a recording of the actual session. These capabilities extend telemetry by adding in data for APIs, third-party services, errors occurring in the browser, user demographics, and application performance from the user perspective. This gives IT, DevSecOps, and SRE teams the ability not only to see the complete end-to-end journey of a request but also to access real-time insight into system health. From there, they can proactively troubleshoot areas of degrading health before they impact application performance. They can also more easily recover from failures and gain a more granular understanding of the user experience.

While IT organizations have the best of intentions and strategy, they often overestimate the ability of already overburdened teams to constantly observe, understand, and act upon an impossibly overwhelming amount of data and insights. Although there are many complex challenges associated with observability, the organizations that overcome these challenges will find it worth their while.

https://www.netis.com/2021/11/23/%e4%b8%80%e6%96%87%e8%af%bb%e6%87%82%e7%9b%91%e6%8e%a7%e4%b8%8e%e5%8f%af%e8%a7%82%e6%b5%8b%e6%80%a7%e7%9a%84%e5%8c%ba%e5%88%ab/

https://reberhardt.com/blog/2016/10/10/capturing-https-traffic-with-tshark.html

https://www.amirootyet.com/post/pcap-analysis-with-wireshark-tshark/

https://www.splunk.com/en_us/data-insider/what-is-observability.html

https://enterprisersproject.com/article/2021/9/devops-monitoring-vs-observability

https://tanzu.vmware.com/what-is-observability

https://newrelic.com/resources/report/2021-observability-forecast

https://www.dynatrace.com/news/blog/what-is-observability-2/

与[转帖]可观测|流量|日志|监控|链路|用户体验相似的内容:

[转帖]可观测|流量|日志|监控|链路|用户体验

https://cloud.tencent.com/developer/article/2019401?areaSource=105001.8&traceId=ySuPzDMCMO5dVSJSYsUT9 What is observability? In IT and cloud computing

[转帖]数据库的可观测性能力与监控能力建设之间的差别

白鳝的洞穴2023-04-18 39 前阵子的DTC2023上,我分享的内容是关于数据库可观测性的。会后有不少朋友都和我聊了关于数据库可观测性的问题,也有很多朋友对于这个新名词感到有点高大上,不过并不以为然。认为可观测性就是以前的数据库监控的炒冷饭。实际上从数据库监控到利用数据库的可观测性能力去做数

[转帖]全面介绍eBPF-概念

https://www.cnblogs.com/charlieroro/p/13403672.html 前面介绍了BCC可观测性和BCC网络,但对底层使用的eBPF的介绍相对较少,且官方欠缺对网络方面的介绍。下面对eBPF进行全面介绍。 目录 全面介绍eBPF-概念 BPF概述 eBPF的演进 使用

[转帖]失控的不仅仅是云成本:一年为可观测性狂烧 4.5 亿,有钱的“大冤种”也扛不住了

https://www.infoq.cn/article/Ozc6Cyi65spmUqFIiaLg “到底是哪家公司 2022 年在Datadog身上花掉了 6500 万美元?”Datadog 最近在财务会议上透露,一项客户一次性支付高达 6500 万美元预付款,此事瞬间引发业内关注:哪家公司出手这

[转帖]零信任策略下K8s安全监控最佳实践(K+)

https://developer.aliyun.com/article/1009607?spm=a2c6h.24874632.expert-profile.126.3b0b506fysVD76 简介: 本文重点将围绕监控防护展开,逐层递进地介绍如何在复杂的分布式容器化环境中借助可观测性平台,持续监

[转帖]一文入门前景广阔的 eBPF

https://zhuanlan.zhihu.com/p/567375673 【摘要】eBPF带来的最大的好处即是可以对内核进行编程性处理,实现对内核中不灵活的部分,实现自定义的处理。这种灵活性使得对于不可见的kernel具有了可观测性的基础,在进行内核监控、网络故障分析、文件系统分析等方面具有广泛

[转帖]一文入门前景广阔的 eBPF

https://zhuanlan.zhihu.com/p/567375673 【摘要】eBPF带来的最大的好处即是可以对内核进行编程性处理,实现对内核中不灵活的部分,实现自定义的处理。这种灵活性使得对于不可见的kernel具有了可观测性的基础,在进行内核监控、网络故障分析、文件系统分析等方面具有广泛

[转帖]初识SkyWalking

https://zhuanlan.zhihu.com/p/361579294 一、SkyWalking 是什么? 一个开源的可观测平台,用于从服务和云原生基础设施收集,分析, 聚合及可视化数据。 SkyWalking 提供了一种简便的方式来清晰地观测分布式系统,甚至横跨多个云平台。SkyWalkin

[转帖]初识SkyWalking

https://zhuanlan.zhihu.com/p/361579294 一、SkyWalking 是什么? 一个开源的可观测平台,用于从服务和云原生基础设施收集,分析, 聚合及可视化数据。 SkyWalking 提供了一种简便的方式来清晰地观测分布式系统,甚至横跨多个云平台。SkyWalkin

[转帖]关于Bonree ONE 2.0,那些运维人不知道的一切

http://blog.itpub.net/31545813/viewspace-2924710/ 近年来,伴随着数字经济的不断深入,以云原生、Devops等为代表的新技术快速发展。传统的IT监控工具多样、分散、庞杂,并且数据种类杂、缺乏关联性,导致整个IT系统不具备真正的可观测性。那么,如何快速发