Skip to content

databrickslabs/splunk-integration

Repository files navigation

Databricks Splunk Integration

Features | Architecture | Documentation References | Compatibility| Log Ingestion Examples | Feedback | Legal Information

The Splunk Integration project is a non-supported bidirectional connector consisting of three main components as depicted in the architecture diagram:

  1. The Databricks add-on for Splunk, an app, that allows Splunk Enterprise and Splunk Cloud users to run queries and execute actions, such as running notebooks and jobs, in Databricks
  2. Splunk SQL database extension (Splunk DB Connect) configuration for Databricks connectivity
  3. Notebooks for Push and Pull events and alerts from Splunk Databricks.

We also provided extensive documentation for Log Collection to ingest, store, and process logs on economical and performant Delta lake.

Features

  • Run Databricks SQL queries right from the Splunk search bar and see the results in Splunk UI (Fig 1 )
  • Execute actions in Databricks, such as notebook runs and jobs, from Splunk (Fig 2 & Fig 3)
  • Use Splunk SQL database extension to integrate Databricks information with Splunk queries and reports (Fig 4 & Fig 5)
  • Push events, summary, alerts to Splunk from Databricks (Fig 6 and Fig 7)
  • Pull events, alerts data from Splunk into Databricks (Fig 8)

Fig 1:

Run Databricks SQL queries right from the Splunk search bar and see the results in Splunk UI

Fig 2:

Execute actions in Databricks, such as notebook runs and jobs, from Splunk

Fig 3:


Fig 4:

Use Splunk SQL database extension to integrate Databricks information with Splunk queries and reports

Fig 5:


Fig 6:

Push events, summary, alerts to Splunk from Databricks

Fig 7:


Fig 8:

Pull events, alerts data from Splunk into Databricks

Architecture

Documentation

Compatibility

Databricks Add-on for Splunk, notebooks and documentation provided in this project are compatible with:

  • Splunk Enterprise version: 8.1.x and 8.2.x
  • Databricks REST API: 1.2 and 2.0:
    • Azure Databricks
    • AWS SaaS, E2 and PVC deployments
    • GCP
  • OS: Platform independent
  • Browser: Safari, Chrome and Firefox

Log ingestion

This project also provides documentation and notebooks to showcase specifics on how to use Databricks for collecting various logs (a comprehensive list is provided below) via stream ingest and batch-ingest using Databricks autoloader and Spark streaming into cloud Data lakes for durable storage on S3. The included documentation and notebooks also provide methods and code details for each log type: parsing, schematizing, ETL/Aggregation, and storing in Delta format to make them available for analytics.

Data collection sources with notebooks and documentation are included for the following sources:

Feedback

Issues with the application? Found a bug? Have a great idea for an addition? Feel free to file an issue or submit a pull request.

Legal Information

This software is provided as-is and is not officially supported by Databricks through customer technical support channels. Support, questions, help, and feature requests can be communicated via email -> cybersecurity@databricks.com or through the Issues page of this repo.