Welcome to the Elven Platform – your complete solution for Monitoring, Incident Management and Status Pages. This guide is designed to ensure that you not only join the platform, but have an exceptional experience from the very first moment. Get ready to start a journey that will exceed all your expectations!

1. Registration and Initial Configuration

Organization registration:

Access Sign Up
Fill in your basic information using a corporate email and click “Sign Up”
Name your organization, accept the Terms and Conditions and click “Start Now”
Confirm your email to activate your account.
Note: remember to put the sender @elven.works as trusted so you don’t miss any type of notification or alert.

Initial Organization Setup:

Log in to your new account
Complete your registration including your phone number
Invite your team to join you, ensuring smooth onboarding from the start
Configure which permission role each member of your team can have access to

2. Exploring the main modules

The Elven Platform offers three main modules, each designed to meet your organization’s unique needs. Check out each of them below and, if you have any questions, our team is here to help you choose the module that best suits your specific goals and challenges. Click here and let’s talk and find the ideal solution for you!

Monitoring Module:

Proactive, customizable solution for tracking the health and performance of systems and applications.

Start by choosing the type of monitoring and configure it by clicking here
Understand the resource screen by clicking here
Analyze the resource metrics by clicking here

Incident Management and Response Module:

In real time, the Elven Platform receives the alert event or incident, triggers the on-call schedule, notifies through communication channels and records all interactions within the platform until resolution.

Start configuring your team by clicking here
Choose the channels you want to be notified by clicking here
Configure call rotation by clicking here
Understand the incident screen by clicking here
Create a postmortem of the incident by clicking here

Status Pages Module:

Customized Status Pages to keep users informed about the status of services or applications via SMS or Webhook, with availability uptime and incident history.

Start by configuring your Status Page here

3. Subscription

We integrate your consumption with your Azure Marketplace invoice. We will soon have native integration.

4. Support and Continuous Improvement

Explore our extensive knowledge base to find answers to your most common questions. clicking here
Contact our dedicated support team at any time for efficient assistance via email at support@elvenworks.atlassian.net or Ticket Opening Tool
Send us your feedback and suggestions to contact@elven.works for continuous improvements

Now that you’re ready to get started, get ready to embark on a journey of unprecedented success and reliability with the Elven Platform. We are honored that you chose our platform and are here to support you every step of the way.m

Elven Observability: Operator Collector (Automatic Instrumentation)

In order to use auto-instrumentation, we need to install the collector operator.

Dependencies:
cert-manager

Install the operator

Important points Auto-instrumentation will not always be sufficient; in some scenarios, manual instrumentation will be required. Tests performed for .NET showed that versions 6, 7, and 8 work well, while in .NET 5, automatic instrumentation is not supported. For more information, consult: https://opentelemetry.io/docs/languages/

				
					kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

If you are going to instrument Golang applications:

				
					kubectl -n opentelemetry-operator-system patch deployment opentelemetry-operator-controller-manager \
--type=json \
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--enable-go-instrumentation=true"}]'

Install the collector

It is a recommended practice to send telemetry from containers to an OpenTelemetry Collector rather than directly to a backend. The Collector helps simplify secrets management, decouples data export issues (such as the need for retries) from your applications, and allows you to add additional data to your telemetry, such as with the k8sattributesprocessor component. If you choose not to use a Collector, you can skip to the next section.

https://opentelemetry.io/docs/kubernetes/operator/automatic/

Configuration example

				
					apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel
spec:
  config: 
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 15
      batch:
        send_batch_size: 10000
        timeout: 10s
    exporters:
      otlp:
        endpoint: "tempo-distributor.domain.io:443"
        tls:
          insecure: false
          insecure_skip_verify: true
      prometheusremotewrite:
        endpoint: https://mimir-distributor.domain.io/api/v1/push
        headers:
          X-Scope-OrgID: <TENANT>          
    service:
      pipelines:
        metrics:
          receivers: [otlp] 
          processors: [batch]
          exporters: [prometheusremotewrite]
        traces:
          receivers: [otlp]
          processors: []
          exporters: [otlp]

Apply the collector

				
					kubectl apply -k otel-collector-operator/

OpenTelemetry Instrumentation

To manage auto-instrumentation, the Operator needs to be configured to know which pods to instrument and which auto-instrumentation to use for those pods. This is done through the Instrumentation CRD.

Ex. Java

				
					apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: instrumentation-sample
spec:
  propagators:
    - tracecontext
    - baggage
    - b3
  sampler:
    type: parentbased_traceidratio
    argument: "1"
  env:
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: otel-collector.monitoring:4318
  java:    
    env:
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: http://otel-collector.monitoring:4317

Ex. DOTNET

				
					apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: instrumentation-sample
spec:
  propagators:
    - tracecontext
    - baggage
    - b3
  sampler:
    type: parentbased_traceidratio
    argument: "1"
  env:
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: otel-collector.default:4318   
  dotnet:
    env:
      - name: OTEL_DOTNET_AUTO_METRICS_CONSOLE_EXPORTER_ENABLED
        value: "false"
      - name: OTEL_DOTNET_AUTO_TRACES_CONSOLE_EXPORTER_ENABLED
        value: "false"
      - name: OTEL_DOTNET_AUTO_LOGS_CONSOLE_EXPORTER_ENABLED
        value: "false"  
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: http://otel-collector.monitoring:4318
      - name: OTEL_TRACES_EXPORTER
        value: "true"
      - name: OTEL_METRICS_EXPORTER
        value: "true"

Add annotations to existing deployments

The final step is to opt-in your services for auto-instrumentation. This is done by updating your services spec.template.metadata.annotations to include a language-specific annotation:

.NET: instrumentation.opentelemetry.io/inject-dotnet: “true”
Go: instrumentation.opentelemetry.io/inject-go: “true”
Java: instrumentation.opentelemetry.io/inject-java: “true”
Node.js: instrumentation.opentelemetry.io/inject-nodejs: “true”
Python: instrumentation.opentelemetry.io/inject-python: “true”

Test application using auto-instrumentation for JAVA
https://opentelemetry.io/docs/kubernetes/operator/automatic/

				
					kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: java-sample
spec:
  replicas: 1
  selector:
    matchLabels:
      app: java-sample
  template:
    metadata:
      labels:
        app: java-sample
      annotations:
        instrumentation.opentelemetry.io/inject-java: "true"
    spec:
      containers:
      - name: java-sample
        image: emr001/java-app
        ports:
        - containerPort: 8080
        env:
          - name: OTEL_SERVICE_NAME
            value: "java-demo"
---
apiVersion: v1
kind: Service
metadata:
  name: java-app
spec:
  selector:
    app: java-sample
  ports:
    - port: 8080
      protocol: TCP
      targetPort: 8080
---
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: instrumentation-sample
spec:
  propagators:
    - tracecontext
    - baggage
    - b3
  sampler:
    type: parentbased_traceidratio
    argument: "1"
  env:
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: otel-collector.monitoring:4318
  java:    
    env:
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: http://otel-collector.monitoring:4317
  dotnet:
    env:
      - name: OTEL_DOTNET_AUTO_METRICS_CONSOLE_EXPORTER_ENABLED
        value: "false"
      - name: OTEL_DOTNET_AUTO_TRACES_CONSOLE_EXPORTER_ENABLED
        value: "false"
      - name: OTEL_DOTNET_AUTO_LOGS_CONSOLE_EXPORTER_ENABLED
        value: "false"  
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: http://otel-collector.monitoring:4318
      - name: OTEL_TRACES_EXPORTER
        value: "true"
      - name: OTEL_METRICS_EXPORTER
        value: "true"              
EOF

Restart the deployment for the operator to inject the agent via an init-container

				
					kubectl rollout restart deployment java-sample

Perform a port-forward on port 8080 and use the loop below to send traces, metrics, and logs.

				
					kubectl port-forward svc/java-app 8080:8080

We will create a loop to send several requests.

				
					while true; do curl http://localhost:8080/api/hello && echo "" && sleep 1; done

Tracing Grafana

Test application using auto-instrumentation for .NET

				
					kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dotnet-sample
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dotnet-sample
  template:
    metadata:
      labels:
        app: dotnet-sample
      annotations:
        instrumentation.opentelemetry.io/inject-dotnet: "true"
    spec:
      containers:
      - name: dotnet-sample
        image: emr001/dotnet8-app:v1
        ports:
        - containerPort: 8080
        env:
          - name: OTEL_SERVICE_NAME
            value: "dotnet-demo"
---
apiVersion: v1
kind: Service
metadata:
  name: dotnet-app
spec:
  selector:
    app: dotnet-sample
  ports:
    - port: 80
      protocol: TCP
      targetPort: 8080
EOF

Restart the deployment for the operator to inject the agent via an init-container

				
					kubectl rollout restart deployment dotnet-sample

Restart the deployment for the operator to inject the agent via an init-container

				
					kubectl port-forward svc/dotnet-app 80:80

We will create a loop to send several requests.

				
					while true; do curl http://localhost/weatherforecast && echo "" && sleep 1; done

				
					while true; do curl http://localhost/error && echo "" && sleep 1; done

Dashboard Grafana

Elven Observability: Lambda Instrumentation

Prerequisites

OTel Collector installed on an EC2 instance
Documentation
Organization registered in Elven Observability
Lambda Function
Allow traffic from Lambda to the EC2 running the Collector if it is not publicly accessible: documentation

without serverless framework
with serverless framework

Instrumentation

Without Serverless Framework
Instrumentation is performed via the AWS console.

Configuring Environment Variables

Go to the Lambda function, scroll down, click on the Configuration tab, select the Environment variables section, and click Edit.

2. Add the following environment variables by clicking on Add Environment Variables.

				
					  AWS_LAMBDA_EXEC_WRAPPER="/opt/otel-handler" 
  OTEL_TRACES_SAMPLER="always_on" 
  OTEL_TRACES_EXPORTER="otlp" 
  OTEL_METRICS_EXPORTER="otlp" 
  OTEL_LOG_LEVEL="DEBUG" 
  OTEL_PROPAGATORS="tracecontext,baggage, xray" 
  OTEL_LAMBDA_TRACE_MODE="capture"

  OTEL_EXPORTER_OTLP_ENDPOINT="<EC2_URL_HERE>" 
  OTEL_SERVICE_NAME="<LAMBDA-NAME>"
  OTEL_RESOURCE_ATTRIBUTES="service.name=<LAMBDA_NAME>,environment=<ENVIRONMENT>"

Only the last three variables need to be modified:

EC2_URL_HERE: The EC2 IP address or DNS if it was configured
LAMBDA-NAME: The name of the service that will appear in traces and metrics
ENVIRONMENT: dev, hml, or prd, depending on the environment being instrumented

Configuring the Instrumentation Layer

In the Code tab, scroll down to the Layers section and click on Add a Layer.

2. Choose the Specify an ARN option and paste this ARN:
arn:aws:lambda:us-east-1:184161586896:layer:opentelemetry-nodejs-0_9_0:4

3. Finally, click Verify, then Add.

With Serverless Framework
This is done in the serverless.yaml file where the Lambdas are configured, as shown in the example below:

It’s important to note that this configuration must be done for each function!

The configurations are the same as those performed in Configuring Environment Variables and Configuring the Instrumentation Layer.

				
					org: <ORG_NAME> 
app: <APP_NAME> 
service: <SERVICE_NAME> 

provider: 
  name: aws 
  runtime: nodejs20.x 

functions: 
  <LAMBDA_NAME_1>:
    handler: <HANDLER> 
    
    layers: 
      - arn:aws:lambda:us-east-1:184161586896:layer:opentelemetry-nodejs-0_9_0:4

    environment: 
      AWS_LAMBDA_EXEC_WRAPPER: "/opt/otel-handler" 
      OTEL_TRACES_SAMPLER: "always_on" 
      OTEL_TRACES_EXPORTER: "otlp" 
      OTEL_METRICS_EXPORTER: "otlp" 
      OTEL_LOG_LEVEL: "DEBUG" 
      OTEL_LAMBDA_TRACE_MODE: "capture" 
      OTEL_PROPAGATORS: "tracecontext,baggage, xray"

      OTEL_EXPORTER_OTLP_ENDPOINT: "<EC2_URL_HERE>" 
      OTEL_SERVICE_NAME: "<LAMBDA_NAME>" 
      OTEL_RESOURCE_ATTRIBUTES: "service.name=<LAMBDA_NAME>,environment=<ENVIRONMENT>" 
  <LAMBDA_NAME_2>:
    handler: <HANDLER> 
    
    layers: 
      - arn:aws:lambda:us-east-1:184161586896:layer:opentelemetry-nodejs-0_9_0:4

    environment: 
      AWS_LAMBDA_EXEC_WRAPPER: "/opt/otel-handler" 
      OTEL_TRACES_SAMPLER: "always_on" 
      OTEL_TRACES_EXPORTER: "otlp" 
      OTEL_METRICS_EXPORTER: "otlp" 
      OTEL_LOG_LEVEL: "DEBUG" 
      OTEL_LAMBDA_TRACE_MODE: "capture" 
      OTEL_PROPAGATORS: "tracecontext,baggage, xray"

      OTEL_EXPORTER_OTLP_ENDPOINT: "<EC2_URL_HERE>" 
      OTEL_SERVICE_NAME: "<LAMBDA_NAME>" 
      OTEL_RESOURCE_ATTRIBUTES: "service.name=<LAMBDA_NAME>,environment=<ENVIRONMENT>"

Once done, simply deploy the changes.

Elven Observability: Installing OpenTelemetry Collector on an EC2

Prerequisites

AWS Account
Organization registered in Elven Observability

Configure the EC2

If you need to configure an EC2 instance, refer to the AWS documentation. However, before completing the creation process, ensure the following:

Set up User Data as described in the section Configuring EC2 User Data.
Enable inbound traffic for TCP 4318 and TCP 4317 in the security group.
Enable the public IP.

Configuring EC2 User Data

To automate the configuration of the OpenTelemetry Collector (i.e., avoiding the need to manually connect to the machine and perform each step), you can use the “User Data” field, which allows you to run a script when the machine starts.

Create a copy of the user_data.sh script, and edit it by replacing <TENANT_ID> with your organization’s name, which corresponds to the value in X-Scope-OrgId: <TENANT_ID-here>, and <YOUR-TOKEN> with the value in Authorization: Bearer <YOUR-TOKEN-here>, both provided by the Elven Works team when your organization was registered.

Paste the edited script into the User Data field in the Advanced Details section during EC2 creation.

				
					#!/bin/bash

# Update packages and install Docker
apt-get update -y
apt-get install docker.io -y

# Add the ubuntu user to the docker group
usermod -aG docker $USER

# Create the otel-config.yaml file
cat <<EOF > /home/ubuntu/otel-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"

exporters:
  otlphttp:
    endpoint: https://tempo.elvenobservability.com/http
    headers:
      X-Scope-OrgID: "<TENANT_ID>"
      Authorization: "Bearer <your-token>"

  prometheusremotewrite:
    endpoint: https://mimir.elvenobservability.com/api/v1/push
    headers:
      X-Scope-OrgID: "<TENANT_ID>"
      Authorization: "Bearer <your-token>"

  loki:
    endpoint: "http://loki.elvenobservability.com/loki/api/v1/push"
    default_labels_enabled:
      exporter: false
      job: true
    headers:
      X-Scope-OrgID: "<TENANT_ID>"
      Authorization: "Bearer <your-token>"

processors:
  batch: {}
  resource:
    attributes:
    - action: insert
      key: loki.tenant
      value: host.name
  filter:
    metrics:
      exclude:
        match_type: regexp
        metric_names:
          - "go_.*"
          - "scrape_.*"
          - "otlp_.*"
          - "promhttp_.*"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch, filter]
      exporters: [prometheusremotewrite]
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp]
    logs:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [loki]
EOF


# Run Otel Collector Contrib using Docker
docker run -d --name otel-collector-contrib \
  -p 4317:4317 -p 4318:4318 \
  -v /home/ubuntu/otel-config.yaml:/etc/otel-config.yaml:ro \
  otel/opentelemetry-collector-contrib:latest \
  --config /etc/otel-config.yaml

# Configure Docker to automatically start on boot
sudo systemctl enable docker

Made synthetically, with
script, Synthetic Monitoring continuously tests the availability of an
HTTP, immediately alerting you as soon as there are failures. One
Platform detects, according to users’ rules, any type of problem in your
digital product before they affect your customers, ensuring its
availability and reliability.

Configuring Synthetic Monitoring

To configure Synthetic Monitoring, click on Synthetic in the platform’s left side menu.

After opening the
Synthetic Monitoring Center, where it will list all your synthetic
monitoring, click on “New Synthetic Monitoring” to add a monitoring.

Firstly, you need to:

1) Choose the name of
the monitoring and the Environment. If you want to create an environment
through this page, click on “+ Environment” and you will be directed to
the creation screen. When you finish creating, click on the reload
symbol for the environment to appear in the list.

2) After configuring the environment, choose the check interval in Interval check.

3) In Response Template,
knowing what the Step response will be, you can place this response in
the field to generate variables. By clicking on “Generate Keys”, the
platform will generate variables to be used in requests in any of the
steps. By clicking on the variables, it will be copied to the clipboard.

Configuring the steps

After the first moment, you will configure the customizable steps for how you want your page or Web APIs to be monitored.

To do this, in the step settings:

1) Choose the name of each step, the Healthcheck URL and the Timeout.

Note: For security reasons, it is
not permitted to enter an IP in the healthcheck field. To monitor an IP,
you need to enter it in a secret and use it in healthcheck

2) Choose the method, whether it is
GET or Post, select Skip SSL Validation, to ignore the existence of the
SSL certificate, and choose TLS Renegotiation, if the security protocol
is required. In the case of the Post method, select the Post-Body Type
from the Raw or “application/x-www-form-undecoded” options. In the case
of Raw, select the Raw Type (JSON, Text, Javascript, XML or HTML) and
fill in the Post Body according to the chosen Raw Type. If it is
“application/x-www-form-undecoded”, fill in the key and value, if you
would like to configure more than one key, click the plus (+) button
next to it.

3) Click on “Optional request” to
configure a Header where you can place the Validation String, the
Headers and Values. If you want to configure more than one Header, click
the plus (+) button next to it.

Assertions

Here you will define what the request should be returned to.

1) In the Source field,
choose the source between JSON Body, Status Code and Text Body (Fill in
the Property field, if the choice is JSON Body).

2) Choose one of the options available in Comparison.

3) define the Target Value.

Assertions configuration examples:

The variables generated by the Response Template can be used in the URL, Headers or Post Body/application/x-www-form-undecoded.

1) To use them in the Value, you need to add it to the field in the same way it was generated.

2) To use a variable in the URL, you need to add it to the URL that will be checked.

3) To use a variable in the Post Body, simply add it as a value.

If you want more than two monitoring steps, click on the “Add Step” button at the end of the steps configuration step

After configuring the steps, If you want, you can configure the opening of an automatic incident. Select incident severity, check interval time in
the Check Interval in seconds (…) to close and open incidents, and
the number of failures and hits to also open and close an incident.

After
configuring the incident, choose which team will be notified and enable
the “Enable to set up automatic incidents opening” switch to be
notified when there is an incident.

After all these configurations, click on “Create monitoring” to create synthetic monitoring.

To edit or delete your monitoring, click on the button with three dots to perform one of two actions.

With the external service custom, we receive data in our API (Application Programming Interface). We generate a CURL that will send us the “alarmed” (opens an incident on the platform) or “resolved” (closes the incident on the platform) data. This way, our platform can process this data and contact your team if your application has an error. To configure a custom integration, request CURL from our team.

Creating an API Token

To create an API Token on the platform:

1 – Click on Organization Settings in the bottom left corner

2 – In the API tab, click on the “+” button to create a new API Token

3 – Select the Api Token type and fill in the Name field, then click Generate Integration Token

Creating a Custom External Service

1 – Enter the Service Hub, located on the left side menu

2 – Select between the options, if you want to open an alert, select Alert Custom or if you want to open an incident, select Incident Custom

3 – In the form, you must fill in the External service name and the Responders who will receive notifications from this service, then click on CREATE

4 – Further below, your External Service information will appear asking you to select an Api token, select the one created previously

5 – After selecting the Api Token, the information required to configure CURL is complete

6 – Once created, your External Services will appear in the External services monitoring center, they will be classified in order of status (in alarm before operational ones)

Below is an example of a CURL for custom integration:

curl --request POST \
  --url '<URL da API Elven>' \
  --header 'Content-Type: application/json' \
  --header 'User-Agent: 1PcustomAuth/1.0' \
  --data '{
  "title": "<título do incidente>",
  "description": "<descrição do incidente>",
  "external_aggregate_key": "001",
  "action": "alarmed",
  "organization": "<org_uid fornecido pela Elven>",
  "severity": "critical"
}'

“–url” = API_URL generated when creating the External Service;
“title” = In this field you define a title that will appear in the incident opened in 1P;
“description” = In this field you define a description for the incident, it will appear in “cause” in the incident opened in 1P;
“external_aggregate_key” = In this field you define an identifier to “open” and “close” the incident, that is, when closing the incident, it must have the same external_aggregate_key as the open incident;
“action” = In this field you define the action to be executed, which can be “alarmed” (opens the incident) or “resolved” (closes the incident);
“organization” = This field is provided by the Elven team at the time of the request;
“severity” = In this field you define the severity associated with the incident, which can be informational, low, moderate, high or critical.

Note: don’t forget to add the headers:

–header ‘Content-Type: application/json’ \

–header ‘User-Agent: 1PcustomAuth/1.0’ \

By posting this CURL you will open/close incidents on the platform, thus being able to manage them and be notified.

In the digital age we live in, companies are subject to a variety of incidents that can disrupt their operations and compromise their security. From cyberattacks to infrastructure failures, these incidents can have significant consequences if not managed properly. This is where incident management comes in – a set of practices and procedures designed to effectively detect, respond to and resolve incidents. In this article, we’ll explore what incident management is and why it’s so crucial to business continuity.

What is Incident Management?

Incident management is a structured process for dealing with events that may disrupt an organization’s normal operations. The goal of incident management is to minimize the impact of these events, restore normality as quickly as possible, and learn from experiences to avoid similar incidents in the future.

Incident Management Components:

Detection and Reporting: The first step of incident management is to detect and report the incident. This can be done through proactive systems monitoring, employee reporting, or security alerts.
Analysis and Assessment: Once an incident is detected, it needs to be analyzed and evaluated to determine its severity and potential impact on company operations.
Response and Mitigation: Based on the analysis of the incident, an appropriate response is developed and implemented to mitigate its negative effects. This may include measures such as isolating compromised systems, patching security vulnerabilities, and communicating with relevant stakeholders.
Recovery and Resolution: After initial mitigation, the focus shifts to incident recovery and resolution. This involves restoring affected systems, reversing any damage caused, and returning to normal operations as quickly as possible.
Post-Incident Analysis and Learning: Once the incident has been resolved, it is essential to perform a post-incident analysis to understand the underlying causes and identify areas for future improvements. This allows the organization to learn from experience and strengthen its security posture.

Why is Incident Management Important?

Incident management plays a key role in protecting and resiliency of organizations in the face of a wide range of threats and challenges. Here are some reasons why it’s so important:

Minimize Downtime: Unmanaged incidents can result in significant downtime, hampering productivity and causing financial losses. A quick and effective response can help minimize this downtime and reduce the impact on operations.
Protect Assets and Data: Incident management helps protect the organization’s critical assets and data against internal and external threats. This includes confidential customer information, intellectual property and critical IT systems.
Preserve Brand Reputation: Cybersecurity incidents and other adverse events can have a significant impact on a company’s brand reputation. An effective response can help mitigate reputational damage and maintain trust with customers and stakeholders.
Ensure Regulatory Compliance: In many industries, organizations are required by law to protect sensitive data and ensure business continuity. Incident management plays an essential role in ensuring regulatory compliance and mitigating legal risks.
Enhance Organizational Resilience: By learning from past incidents and implementing continuous improvements to processes and systems, organizations can strengthen their resilience and ability to deal with future challenges.

In an increasingly complex and interconnected business environment, incident management is essential to protect assets, ensure business continuity and preserve brand reputation. By adopting effective incident management practices, organizations can face challenges with confidence and better prepare for the digital future.

Main Incident Management features of the Elven Platform

Call Rotation
Centralization of alerts
Incident centralization
Manual incident opening
Incident update
Unlimited intelligent duty roster
Post-mortem by incident
Dash with key metrics
Notifications on communication channels (Slack, Discord, WhatsApp) among others
War-room by incident (Slack)
Integration with ITSM tool (ServiceNOW, Jira)

Insights provides a comprehensive view of an organization’s historical data, allowing leadership to make informed decisions to enhance operational maturity. Found in the sidebar menu, Insights consists of the following tabs:

General
Incidents
Responders

General

The General tab enables tracking and understanding the performance of monitoring conducted by the One Platform over the last 30 days. It offers a view of performance-related data, allowing users to quickly identify trends, patterns, and anomalies.

Uptime: Uptime is the duration during which a monitoring process remains operational without interruptions or failures. Downtime is the period when it experienced failures.
MTTR (Mean time to Resolve): The average time from the trigger of a failure to its resolution;
MTTA (Mean time to Acknowledge): The average time to acknowledge a failure;
MTBF (Mean Time Between Failures): The average time between failures;

Incidents

Incidents provides an overview of the response effort over time for each incident that occurred in an organization. You can filter incidents by date range, severity, and source of the incident.

Total Incidents: The total number of incidents the organization faced within a given period.
Total Response Effort: The total time spent on an incident, measured from acknowledgment to resolution.
MTTA (Mean Time to Acknowledge): The average time to acknowledge an incident.
MTTR (Mean Time to Resolve): The average time from the trigger of an incident to its resolution.
Time Cluster: The group corresponding to the period when the incident occurred.
Business Hour Interruptions: Interruptions that occurred on weekdays between 8 AM and 6 PM.
Off Hour Interruptions: Interruptions that occurred on weekdays between 6 PM and 10 PM, or during weekends between 6 PM and 10 PM.
Sleep Hour Interruptions: Interruptions that occurred any day of the week between 10 PM and 8 AM.
TTA (Time to Acknowledge): The amount of time between the incident trigger and its acknowledgment.
TTR (Time to Resolve): The time from the incident trigger to its resolution.

Responders

The Responders tab provides insights into the impact of incidents on responders and includes data on how the incidents were resolved. It also features an individual list of incidents for each responder. You can filter this dashboard by date range, severity, responders, time cluster, and MTTR.

Total Incidents: The total number of incidents the organization faced within a given period.
Total Response Effort: The total sum of the time responders were involved with incidents, measured from the moment a responder acknowledges the incident until it is resolved.
MTTA (Mean Time to Acknowledge): The average time a responder took to acknowledge an incident.
MTTR (Mean Time to Resolve): The average time from the trigger of an incident to its resolution by a responder.

1. Registration and Initial Configuration

Organization registration:

Access Sign Up
Fill in your basic information using a corporate email and click “Sign Up”
Name your organization, accept the Terms and Conditions and click “Start Now”
Confirm your email to activate your account.
Note: remember to put the sender @elven.works as trusted so you don’t miss any type of notification or alert.

Initial Organization Setup:

Log in to your new account
Complete your registration including your phone number
Invite your team to join you, ensuring smooth onboarding from the start
Configure which permission role each member of your team can have access to

2. Exploring the main modules

Monitoring Module:

Proactive, customizable solution for tracking the health and performance of systems and applications.

Start by choosing the type of monitoring and configure it by clicking here
Understand the resource screen by clicking here
Analyze the resource metrics by clicking here

Incident Management and Response Module:

Start configuring your team by clicking here
Choose the channels you want to be notified by clicking here
Configure call rotation by clicking here
Understand the incident screen by clicking here
Create a postmortem of the incident by clicking here

Status Pages Module:

Customized Status Pages to keep users informed about the status of services or applications via SMS or Webhook, with availability uptime and incident history.

Start by configuring your Status Page here

3. Subscription

We integrate your consumption with you AWS Marketplace invoice. We will soon have native integration.

4. Support and Continuous Improvement

Explore our extensive knowledge base to find answers to your most common questions. clicking here
Contact our dedicated support team at any time for efficient assistance via email at support@elvenworks.atlassian.net or Ticket Opening Tool
Send us your feedback and suggestions to contact@elven.works for continuous improvements

1. Registration and Initial Configuration

Organization registration:

Access Sign Up
Fill in your basic information using a corporate email and click “Sign Up”
Name your organization, accept the Terms and Conditions and click “Start Now”
Confirm your email to activate your account.
Note: remember to put the sender @elven.works as trusted so you don’t miss any type of notification or alert.

Initial Organization Setup:

Log in to your new account
Complete your registration including your phone number
Invite your team to join you, ensuring smooth onboarding from the start
Configure which permission role each member of your team can have access to

2. Exploring the main modules

Monitoring Module:

Proactive, customizable solution for tracking the health and performance of systems and applications.

Start by choosing the type of monitoring and configure it by clicking here
Understand the resource screen by clicking here
Analyze the resource metrics by clicking here

I ncident Management and Response Module:

Start configuring your team by clicking here
Choose the channels you want to be notified by clicking here
Configure call rotation by clicking here
Understand the incident screen by clicking here
Create a postmortem of the incident by clicking here

Status Pages Module:

Customized Status Pages to keep users informed about the status of services or applications via SMS or Webhook, with availability uptime and incident history.

Start by configuring your Status Page here

Main features Elven Platform

3. Support and Continuous Improvement

Explore our extensive knowledge base to find answers to your most common questions. clicking here
Contact our dedicated support team at any time for efficient assistance via email at support@elvenworks.atlassian.net or Ticket Opening Tool
Send us your feedback and suggestions to contato@elven.works for continuous improvements

1. Registration and Initial Configuration

2. Exploring the main modules

Monitoring Module:

Incident Management and Response Module:

Status Pages Module:

3. Subscription

4. Support and Continuous Improvement

Elven Observability: Operator Collector (Automatic Instrumentation)

Install the collector

OpenTelemetry Instrumentation

Add annotations to existing deployments

Tracing Grafana

Test application using auto-instrumentation for .NET

Dashboard Grafana

Elven Observability: Lambda Instrumentation

Instrumentation

Elven Observability: Installing OpenTelemetry Collector on an EC2

Configure the EC2

Configuring EC2 User Data

Configuring the steps

After the first moment, you will configure the customizable steps for how you want your page or Web APIs to be monitored.

To do this, in the step settings:

1) Choose the name of each step, the Healthcheck URL and the Timeout.

Note: For security reasons, it isnot permitted to enter an IP in the healthcheck field. To monitor an IP, you need to enter it in a secret and use it in healthcheck

3) Click on “Optional request” toconfigure a Header where you can place the Validation String, theHeaders and Values. If you want to configure more than one Header, click the plus (+) button next to it.

Assertions

Creating an API Token

To create an API Token on the platform:

1 – Click on Organization Settings in the bottom left corner

2 – In the API tab, click on the “+” button to create a new API Token

3 – Select the Api Token type and fill in the Name field, then click Generate Integration Token

Creating a Custom External Service

Below is an example of a CURL for custom integration:

What is Incident Management?

Incident Management Components:

Why is Incident Management Important?

Main Incident Management features of the Elven Platform

General

Incidents

Responders

1. Registration and Initial Configuration

2. Exploring the main modules

Monitoring Module:

Incident Management and Response Module:

Status Pages Module:

3. Subscription

4. Support and Continuous Improvement

1. Registration and Initial Configuration

2. Exploring the main modules

Monitoring Module:

Incident Management and Response Module:

Status Pages Module:

Main features Elven Platform

3. Support and Continuous Improvement

Note: For security reasons, it is
not permitted to enter an IP in the healthcheck field. To monitor an IP,
you need to enter it in a secret and use it in healthcheck

3) Click on “Optional request” to
configure a Header where you can place the Validation String, the
Headers and Values. If you want to configure more than one Header, click
the plus (+) button next to it.

I ncident Management and Response Module: