Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOCS-613 - AI-Driven Investigations - beta #4976

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions docs/observability/ai-driven-investigations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
id: ai-driven-investigations
title: AI-Driven Investigations
description: Generate a service map to analyze service coverage and stability.
---

<head>
<meta name="robots" content="noindex" />
</head>

<p><a href="/docs/beta"><span className="beta">Beta</span></a></p>

import useBaseUrl from '@docusaurus/useBaseUrl';

<img src={useBaseUrl('img/observability/service-intelligence-icon.png')} alt="icon" width="50"/>

AI-Driven Investigations uses AI to generate a map of system services, revealing areas that require attention. It surfaces real-time signals for issues such as latency and errors, providing actionable paths for root-cause analysis.

The functionality of AI-Driven Investigations is similar to that of the [Services List and Map](/docs/apm/services-list-map/), but uses AI to generate the list and map, providing deeper insights and more targeted surfacing for areas of concern. No tracing setup is required.

## What is AI-Driven Investigations?

AI-Driven Investigations leverages an AI generative context engine to analyze your organization's raw log data to provide dynamic observability of the services in your enterprise. It determines real-time environment status and dynamically builds visualizations and instructions to guide troubleshooters.

Insights are generated by passing a sample of data for the past 30 minutes. Output includes details on logs that result in signals, and recommended next steps. You can create an investigation for any time in the past 14 days.

AI-Driven Investigations provides developers, security, and IT operations teams with a single source of truth and remediation recommendations in natural language. Engineers that are responsible for maintaining the uptime of a service would benefit most from AI-Driven Investigations. Depending upon the company and how you manage your operations, these could be development operations (DevOps) engineers, service reliability engineers (SREs), or information technology operations (ITOps) engineers.

## View AI-Driven Investigations

To open AI-Driven Investigations, in the main Sumo Logic menu of the [**New UI**](/docs/get-started/sumo-logic-ui), select **AI-Driven Investigations**.

When you first open AI-Driven Investigations, the display of an investigation will include a list of services to the left, and a map of the services to the right. These services are automatically identified by AI analysis of logs in your environment.

The service list and map visually represents your application environment, giving you a greater understanding of your application architecture, hierarchy, and dependencies between microservices. Health of each microservice is reflected in the color of nodes in the map to help you spot potential problems and bottlenecks in your application infrastructure.

<img src={useBaseUrl('img/observability/service-intelligence-initial-map.png')} alt="AI-Driven Investigations list and map" style={{border: '1px solid gray'}} width="800" />

### View a summary of all service activity

AI-Driven Investigations provides an AI-generated summary of all detected service signals and steps to remediate issues.

1. Click the summary button to the right of the services map.<br/><img src={useBaseUrl('img/observability/summary-button.png')} alt="Summary button" style={{border: '1px solid gray'}} width="150" />
1. The **Summary** panel will open, displaying observations, root cause analysis, and recommended next steps.<br/><img src={useBaseUrl('img/observability/service-intelligence-signals-summary.png')} alt="Signals summary" style={{border: '1px solid gray'}} width="300" />
1. Carefully review the summary to learn actions you can take to address the findings.

### View the summary for a single service

When you select a service in the list or a node in the map that has signals, a summary panel displays to the right. The panel contains findings about the service with log examples and steps you can take to remediate problems with the service.

1. To find services to investigate, at the top of the screen use the following controls:<br/><img src={useBaseUrl('img/observability/filter-results.png')} alt="Filter results" style={{border: '1px solid gray'}} width="800" />
* **Search Services**. Enter keywords to search services names.
* **Signal Type**. Select the signal types to search for:
* **Latency**. An increase or decrease in latency of the requests to the system.
* **Errors**. Error signals detected.
* **Services with Signals**. Show only services with signals. Normally operating services will not be shown.
1. Click a service in the list, or a node in the map. The **App Service Summary** panel displays to the right showing signals of interest and recommended next steps.<br/><img src={useBaseUrl('img/observability/service-intelligence-service-example.png')} alt="Service summary example" style={{border: '1px solid gray'}} width="800" />
1. Click an expand arrow on a signal to view an example of a log entry that illustrates the issue. You can use this information to query for logs with similar content.<br/><img src={useBaseUrl('img/observability/service-intelligence-log-example.png')} alt="Log example" style={{border: '1px solid gray'}} width="300" />
1. Scroll to the bottom of the **App Service Summary** pane to the **Next Steps** section. This section describes concrete steps you can take to remediate issues identified in the selected service.<br/><img src={useBaseUrl('img/observability/service-intelligence-next-steps.png')} alt="AI-Driven Investigations next steps" style={{border: '1px solid gray'}} width="300" />
1. Click <img src={useBaseUrl('img/observability/copilot-logo.png')} alt="Copilot logo" width="20" /> to [open Copilot](/docs/search/copilot/#step-1-open-copilot). Use Copilot to investigate further.

### Explore the AI-Driven Investigations UI

Perform the following steps to explore the UI:
1. The legend at the bottom of the screen shows that gold nodes in the map represent services with signals, while blue nodes indicate normal, expected activity. Use the zoom buttons to resize the map.<br/><img src={useBaseUrl('img/observability/service-intelligence-size-buttons.png')} alt="Resize buttons" style={{border: '1px solid gray'}} width="200" />
1. Click **Generate Latest** to refresh the services list and map with the most current version.<br/><img src={useBaseUrl('img/observability/service-intelligence-refresh.png')} alt="Refresh button" style={{border: '1px solid gray'}} width="150" />
1. Click **History** to browse previously-generated versions.<br/><img src={useBaseUrl('img/observability/service-intelligence-history.png')} alt="Refresh button" style={{border: '1px solid gray'}} width="150" /><br/><img src={useBaseUrl('img/observability/service-intelligence-history-dialog.png')} alt="Services history" style={{border: '1px solid gray'}} width="250" />
1. To open a version, select a previous version directly from the list. When you select a version, the time and date of the version appears at the top of the page:<br/><img src={useBaseUrl('img/observability/service-intelligence-historical-view-time.png')} alt="Timestamp" style={{border: '1px solid gray'}} width="250" />
1. Click **Select Date** and **Select Time** to find a specific version. You can search for a time in the past 14 days for a desired investigation.
1. If an investigation does not exist for a specific date and time, you can generate it.<br/><img src={useBaseUrl('img/observability/service-intelligence-generate-button.png')} alt="Generate an investigation" style={{border: '1px solid gray'}} width="250" />
1. Click the buttons on the right to regenerate the services list and map or provide feedback:<br/><img src={useBaseUrl('img/observability/service-intelligence-regenerate-and-feedback-buttons.png')} alt="Regenerate and feedback buttons" width="50" />
* Click the **Regenerate** <img src={useBaseUrl('img/observability/service-intelligence-regenerate-button.png')} alt="Regenerate button" width="20" /> button to display this dialog:<br/><img src={useBaseUrl('img/observability/regenerate-map.png')} alt="Regenerate map" style={{border: '1px solid gray'}} width="250" />
1. Select from **Concerns** to provide feedback about the current view, and select from **Variations** to request how AI should automatically adjust regeneration.
1. Click **Regenerate** to regenerate the service list and map using the last 15 minutes of data.
:::warning
It is only necessary to regenerate if it has been longer than 15 minutes since last generation. Do not regenerate multiple times in quick succession, because doing so will not result in new results.
:::
* Click the **Mark as verified** <img src={useBaseUrl('img/observability/service-intelligence-thumbs-up.png')} alt="Mark as verified button" width="20" /> button to provide feedback that the current services list and map are good.
* Click the **Needs improvement** <img src={useBaseUrl('img/observability/service-intelligence-thumbs-down.png')} alt="Needs improvement button" width="20" /> button to provide feedback that the current services list and map need work. In the provided box, describe what was unclear, inaccurate, or unhelpful.

## FAQ

### How is data for AI-Driven Investigations generated?

Data for AI-Driven Investigations is generated by our AI system leveraging Claude on Bedrock, which is specially trained to provide actionable signal intelligence at the system level. The dataset is sampled periodically every 15 to 30 minutes using our proprietary algorithm, which ensures comprehensive log collection from all services. It effectively deduplicates logs and filters out noise to maintain data quality.

### What is the confidence level of the large language model (LLM) output?

There is a high degree of confidence in the AI outputs as verified through human judging of sample sets. Sumo Logic internally relies on the human judging process and follows that up with tuning the dataset or the prompt.

Furthermore, you can provide feedback, such as noting it is missing services.

### What information is generated?

Data is generated by passing a sample of data to the LLM for the past 15 minutes along with a prompt. This returns a "service summary" of logs along with log messages related to the signals generated, with a hypothesis of the problem and next steps for further exploration.

Our AI delivers actionable intelligence in near-real-time via log analysis only. The resulting service map highlights the interconnection of various services and components. The summary and signals highlight what services are functioning normally, and any that might have errors or latency issues. Clicking on an individual service lets you view the specific signals generated for the service. Next steps are suggested for how to resolve any issues that might be present. From there, you can view example log lines, and dig deeper into the logs if needed.

### How does regeneration work?

Regeneration takes the same sampled data that was used for the first attempt and sends it to the model again, along with special instructions based on your feedback about whether anything looked incorrect (missing services, wrong connections, service names), and how different they want the result to be (subtly different, or very different).

The idea is that the model gets to try again with a hint about what to focus on. The new result may be the same or different, and it may or not be more correct than the original result.

### What are the signal types?

The signal types are:
* **Latency**. An increase or decrease in latency of the requests to the system.
* **Errors**. Error signals detected.

<!-- Add these back when they reappear:
* **Traffic**. An increase or decrease in traffic requests to the service.
* **Saturation**. A change in resource usage of the system.
* **Security**. Logs that indicate security concerns.
-->

These signals are extracted based on the logs observed for the last 15 minutes.

### Is customer data used to train the model?

No customer data is used to train the model. We use a Foundation Model (FM) with custom prompts that don’t use customer data to train.

### What sort of instrumentation is required?

No instrumentation is required. You can send your structured and unstructured data to Sumo Logic through normal data collection to drive s.

### And are log format limitations?

There is no limitation on log formats.

### Can you choose which logs to use in AI-Driven Investigations?

No, you can't choose which logs to use in AI-Driven Investigations. AI-Driven Investigations will sample all logs in your environment for the desired timeframe.
Binary file added static/img/observability/copilot-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/observability/filter-results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/observability/regenerate-map.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/observability/summary-button.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.