https://cloud.tencent.com/developer/article/2019401?areaSource=105001.8&traceId=ySuPzDMCMO5dVSJSYsUT9
In IT and cloud computing, observability is the ability to measure a system’s current state based on the data it generates, such as logs, metrics, and traces.
在 IT 和云计算中,可观察性是根据系统生成的数据(如日志、指标和跟踪)来衡量系统当前状态的能力。
Observability relies on telemetry derived from instrumentation that comes from the endpoints and services in your multi-cloud computing environments. In these modern environments, every hardware, software, and cloud infrastructure component and every container, open-source tool, and microservice generates records of every activity. The goal of observability is to understand what’s happening across all these environments and among the technologies, so you can detect and resolve issues to keep your systems efficient and reliable and your customers happy.
可观察性依赖于从多云计算环境中的终结点和服务的检测派生的遥测数据。在这些现代环境中,每个硬件、软件和云基础架构组件以及每个容器、开源工具和微服务都会生成每个活动的记录。可观察性的目标是了解所有这些环境和技术中发生的情况,以便您可以检测和解决问题,以保持系统的高效性和可靠性以及客户的满意。
Organizations usually implement observability using a combination of instrumentation methods including open-source instrumentation tools, such as OpenTelemetry.
组织通常使用检测方法的组合来实现可观察性,包括开源检测工具,如 OpenTelemetry。
Many organizations also adopt an observability solution to help them detect and analyze the significance of events to their operations, software development life cycles, application security, and end-user experiences.
许多组织还采用可观察性解决方案来帮助他们检测和分析事件对其运营、软件开发生命周期、应用程序安全性和最终用户体验的重要性。
Observability has become more critical in recent years, as cloud-native environments have gotten more complex and the potential root causes for a failure or anomaly have become more difficult to pinpoint. As teams begin collecting and working with observability data, they are also realizing its benefits to the business, not just IT.
近年来,随着云原生环境变得越来越复杂,故障或异常的潜在根本原因变得更加难以确定,可观察性变得越来越重要。随着团队开始收集和处理可观察性数据,他们也意识到它对业务的好处,而不仅仅是IT。
Because cloud services rely on a uniquely distributed and dynamic architecture, observability may also sometimes refer to the specific software tools and practices businesses use to interpret cloud performance data. Although some people may think of observability as a buzzword for sophisticated application performance monitoring (APM), there are a few key distinctions to keep in mind when comparing observability and monitoring.
由于云服务依赖于独特的分布式动态架构,因此可观察性有时也可能是指企业用于解释云性能数据的特定软件工具和实践。尽管有些人可能认为可观察性是复杂应用程序性能监视 (APM) 的流行语,但在比较可观察性和监视时,需要记住一些关键区别。
Is observability really monitoring by another name? In short, no. While observability and monitoring are related — and can complement one another — they are actually different concepts.
可观察性真的是用另一个名字来监视吗?简而言之,没有。虽然可观察性和监测是相关的,并且可以相互补充,但它们实际上是不同的概念。
In a monitoring scenario, you typically preconfigure dashboards that are meant to alert you to performance issues you expect to see later. However, these dashboards rely on the key assumption that you’re able to predict what kinds of problems you’ll encounter before they occur.
在监视方案中,通常预配置仪表板,这些仪表板旨在提醒您以后会看到的性能问题。但是,这些仪表板依赖于一个关键假设,即您能够在问题发生之前预测将遇到的问题类型。
Cloud-native environments don’t lend themselves well to this type of monitoring because they are dynamic and complex, which means you have no way of knowing in advance what kinds of problems might arise.
云原生环境不适合这种类型的监视,因为它们是动态和复杂的,这意味着您无法提前知道可能会出现哪种问题。
In an observability scenario, where an environment has been fully instrumented to provide complete observability data, you can flexibly explore what’s going on and quickly figure out the root cause of issues you may not have been able to anticipate.
在可观察性方案中,环境已完全检测以提供完整的可观察性数据,您可以灵活地探索正在发生的事情,并快速找出可能无法预期的问题的根本原因。
MONITORING |
OBSERVABILITY |
---|---|
Tracks metrics and logs. The focus is on gathering metrics and log data, with alerts when set thresholds are exceeded.跟踪指标和日志。重点是收集指标和日志数据,并在超过设置的阈值时发出警报。 |
Delivers actionable information. Intelligence is applied to telemetry data, producing actionable feedback loops and enabling automated changes and optimizations to infrastructure and application runtime deployment.提供可操作的信息。智能应用于遥测数据,生成可操作的反馈循环,并支持对基础结构和应用程序运行时部署进行自动更改和优化。 |
Collects data. Infrastructure monitoring collects valuable metrics such as CPU, memory, response time, error rates, and latency.收集数据。基础结构监视收集有价值的指标,如 CPU、内存、响应时间、错误率和延迟。 |
Correlates metrics. Observability brings together metrics from disparate systems, identifying specific problems, so you can quickly understand how they relate to one another.关联指标。可观察性将来自不同系统的指标汇集在一起,识别特定问题,因此您可以快速了解它们之间的相互关系。 |
Watches defined systems. Monitoring keeps track of the health of important systems.监视定义的系统。监视可跟踪重要系统的运行状况。 |
Interprets data from complex systems. Observability allows for granular insights and debugging, enabling teams to correct problems as they happen。解释来自复杂系统的数据。可观察性允许进行细粒度的见解和调试,使团队能够在问题发生时纠正问题。 |
In enterprise environments, observability helps cross-functional teams understand and answer specific questions about what’s happening in highly distributed systems. Observability enables you to understand what is slow or broken and what needs to be done to improve performance. With an observability solution in place, teams can receive alerts about issues and pro-actively resolve them before they impact users.
在企业环境中,可观察性可帮助跨职能团队理解和回答有关高度分布式系统中发生的情况的特定问题。可观察性使您能够了解哪些内容缓慢或损坏,以及需要采取哪些措施来提高性能。借助可观察性解决方案,团队可以接收有关问题的警报,并在问题影响用户之前主动解决问题。
Because modern cloud environments are dynamic and constantly changing in scale and complexity, most problems are neither known nor monitored. Observability addresses this common issue of “unknown unknowns,” enabling you to continuously and automatically understand new types of problems as they arise.
由于现代云环境是动态的,并且规模和复杂性不断变化,因此大多数问题既不为人所知,也不被监控。可观察性解决了“未知未知数”这一常见问题,使您能够在出现新类型问题时持续自动地了解它们。
Observability is also a critical capability of artificial intelligence for IT operations (AIOps). As more organizations adopt cloud-native architectures, they are also looking for ways to implement AIOps, harnessing AI as a way to automate more processes throughout the DevSecOps life cycle. By bringing AI to everything — from gathering telemetry to analyzing what’s happening across the full technology stack — your organization can have the reliable answers essential for automating application monitoring, testing, continuous delivery, application security, and incident response.
可观察性也是人工智能在 IT 运营 (AIOps) 中的关键功能。随着越来越多的组织采用云原生架构,他们也在寻找实现AIOps的方法,利用AI作为在整个DevSecOps生命周期中自动化更多流程的一种方式。通过将 AI 引入一切(从收集遥测数据到分析整个技术堆栈中发生的情况),您的组织可以获得自动化应用程序监视、测试、持续交付、应用程序安全性和事件响应所必需的可靠答案。
The value of observability doesn’t stop at IT use cases. Once you begin collecting and analyzing observability data, you have an invaluable window into the business impact of your digital services. This visibility enables you to optimize conversions, validate that software releases meet business goals, measure the outcomes of your user experience SLOs, and prioritize business decisions based on what matters most.
可观察性的价值并不止于 IT 用例。一旦您开始收集和分析可观察性数据,您就可以了解数字服务的业务影响。这种可见性使您能够优化转化,验证软件版本是否符合业务目标,衡量用户体验 SLO 的结果,并根据最重要的内容确定业务决策的优先级。
When an observability solution also analyzes user experience data using synthetic and real-user monitoring, you can discover problems before your users do and design better user experiences based on real, immediate feedback.
当可观察性解决方案还使用综合和真实用户监视来分析用户体验数据时,您可以在用户之前发现问题,并根据真实、即时的反馈设计更好的用户体验。
Observability delivers powerful benefits to IT teams, organizations, and end-users alike. Here are some of the use cases observability facilitates:
DevSecOps teams can tap observability to get more insights into the apps they develop, and automate testing and CI/CD processes so they can release better quality code faster. This means organizations waste less time on war rooms and finger-pointing. Not only is this a benefit from a productivity standpoint, but it also strengthens the positive working relationships that are essential for effective collaboration.
These organizational improvements open the door to further innovation and digital transformation. And, more importantly, the end-user ultimately benefits in the form of a high-quality user experience.
If you’ve read about observability, you likely know that collecting the measurements of logs, metrics, and distributed traces are the three key pillars to achieving success. However, observing raw telemetry from back-end applications alone does not provide the full picture of how your systems are behaving.
Neglecting the front-end perspective potentially skews or even misrepresents the full picture of how your applications and infrastructure are performing in the real world for real users. Extending the three-pillars approach, IT teams must augment telemetry collection with user-experience data to eliminate blind spots:
Obviously, data collection is only the start. Simply having access to the right logs, metrics, and traces isn’t enough to gain true observability into your environment. Once you’re able to use that telemetry data to achieve the end goals of improving end-user experience and business outcomes, only then can you really say you’ve achieved the purpose of observability.
There are other observability capabilities organizations can use to observe their environments. Open-source solutions, such as OpenTelemetry, provide a de facto standard for collecting telemetry data in cloud settings. These open-source solutions enhance observability for cloud-native applications and make it easier for developers and operations teams to achieve a consistent understanding of application health across multiple environments.
Organizations can also use real user monitoring to gain real-time visibility into the user experience, tracking the path of a single request and gaining insight into every interaction it has with every service along the way. This experience can be observed by synthetic monitoring or even a recording of the actual session. These capabilities extend telemetry by adding in data for APIs, third-party services, errors occurring in the browser, user demographics, and application performance from the user perspective. This gives IT, DevSecOps, and SRE teams the ability not only to see the complete end-to-end journey of a request but also to access real-time insight into system health. From there, they can proactively troubleshoot areas of degrading health before they impact application performance. They can also more easily recover from failures and gain a more granular understanding of the user experience.
While IT organizations have the best of intentions and strategy, they often overestimate the ability of already overburdened teams to constantly observe, understand, and act upon an impossibly overwhelming amount of data and insights. Although there are many complex challenges associated with observability, the organizations that overcome these challenges will find it worth their while.
https://www.netis.com/2021/11/23/%e4%b8%80%e6%96%87%e8%af%bb%e6%87%82%e7%9b%91%e6%8e%a7%e4%b8%8e%e5%8f%af%e8%a7%82%e6%b5%8b%e6%80%a7%e7%9a%84%e5%8c%ba%e5%88%ab/
https://reberhardt.com/blog/2016/10/10/capturing-https-traffic-with-tshark.html
https://www.amirootyet.com/post/pcap-analysis-with-wireshark-tshark/
https://www.splunk.com/en_us/data-insider/what-is-observability.html
https://enterprisersproject.com/article/2021/9/devops-monitoring-vs-observability
https://tanzu.vmware.com/what-is-observability
https://newrelic.com/resources/report/2021-observability-forecast
https://www.dynatrace.com/news/blog/what-is-observability-2/