Skip to content

rashmigottipati/snap-plugin-collector-pysmart

 
 

Repository files navigation

Snap collector plugin - Smartmon in python

This Snap plugin collects metrics from the Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) leveraging the pySMART library. The purpose of S.M.A.R.T. is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests.

It's used in the Snap framework.

  1. Getting Started
  1. Documentation
  1. Community Support
  2. Contributing
  3. License
  4. Acknowledgements

Getting Started

System Requirements

Operating systems

All OSs currently supported by snap:

  • Linux/amd64
  • Darwin/amd64

Installation

Download psutil plugin binary:

You can get the pre-built binaries for your OS and architecture under the plugin's release page. For Snap, check here.

To build the plugin binary:

Fork https://github.com/intelsdi-x/snap-plugin-collector-pysmart

Clone repo into $GOPATH/src/github.com/intelsdi-x/:

$ git clone https://github.com/<yourGithubID>/snap-plugin-collector-pysmart.git

Configuration and Usage

Documentation

There are a number of other resources you can review to learn to use this plugin:

Collected Metrics

This plugin will identify all the devices on the node which have S.M.A.R.T. enabled and automatically populate the list of collected metrics based on which are being exposed by the device. This will be different per manufacturer and per device.

  • Ensure that S.M.A.R.T. is enabled on the device

Below is an example of the metrics being gathered by the Intel 3700 SSD

  • Note: $deviceName will be dependent on the path (i.e. /dev/sda1)
Namespace Description (optional)
/intel/smartmon/devices/$deviceName/Reserve_Block_Count available reserved space raw value
/intel/smartmon/devices/$deviceName/Program_Fail_Count shows total count of program fails
/intel/smartmon/devices/$deviceName/Unexpected_Power_Loss_Count reports number of unclean shutdowns, cumulative over the life of the ssd
/intel/smartmon/devices/$deviceName/Power_Loss_Cap_Test last test result as microseconds to discharge cap
/intel/smartmon/devices/$deviceName/SATA_Downshift_Count reports number of times SATA interface selected lower signaling rate due to error
/intel/smartmon/devices/$deviceName/Temperature_Case reports SSD case temperature statistics
/intel/smartmon/devices/$deviceName/Unsafe_Shutdown_Count reports the cumulative number of unsafe (unclean) shutdown events over the life of the device
/intel/smartmon/devices/$deviceName/Temperature_Internal reports internal temperature of the SSD in degrees Celsius
/intel/smartmon/devices/$deviceName/CRC_Error_Count shows total number of encountered SATA interface cyclic redundancy check (CRC) errors
/intel/smartmon/devices/$deviceName/Host_Writes_32mb reports total number of sectors written by the host system
/intel/smartmon/devices/$deviceName/Timed_Workload_Host_ReadWrite_Ratio shows the percentage of I/O operations that are read operations
/intel/smartmon/devices/$deviceName/Timed_Workload_Timer measures the elapsed time, number of minutes since starting this workload timer
/intel/smartmon/devices/$deviceName/Thermal_Throttle reports Percent Throttle Status and Count of events
/intel/smartmon/devices/$deviceName/Host_Writes_32mb_Total_LBAs_Written reports the total number of sectors written by the host system
/intel/smartmon/devices/$deviceName/Host_Reads_32mb_Total_LBAs_Read reports the total number of sectors read by the host system
/intel/smartmon/devices/$deviceName/NAND_Writes_32mb reports the total number of sectors writen by the host system

Examples

This is an example running psutil and writing data to a file. It is assumed that you are using the latest Snap binary and plugins.

The example is run from a directory which includes snaptel, snapteld, along with the plugins and task file.

In one terminal window, open the Snap daemon:

$ snapteld -l 1 -t 0

The option "-l 1" is for setting the debugging log level and "-t 0" is for disabling plugin signing.

In another terminal window: Load pysmart plugin

$ snaptel plugin load snap_pysmart/plugin.py

See available metrics for your system. Note The * in the metric list name indicates a dynamic metric which will update depending on the device names and attribute names

$ snaptel metric list

Get file plugin for publishing and load it:

$ wget  http://snap.ci.snap-telemetry.io/plugins/snap-plugin-publisher-file/latest/linux/x86_64/snap-plugin-publisher-file
$ chmod 755 snap-plugin-publisher-file

$ snaptel plugin load snap-plugin-publisher-file

Create a task file. For example, task-smart.json:

Creating a task manifest file.

{
    "version": 1,
    "schedule": {
        "type": "simple",
        "interval": "1s"
    },
    "workflow": {
        "collect": {
            "metrics": {
                "/intel/smartmon/devices/*/*/threshold": {},
                "/intel/smartmon/devices/*/*/value": {},
                "/intel/smartmon/devices/*/*/whenfailed": {},
                "/intel/smartmon/devices/*/*/worst": {},
                "/intel/smartmon/devices/*/*/type": {},
                "/intel/smartmon/devices/*/*/updated": {},
                "/intel/smartmon/devices/*/*/raw": {},
                "/intel/smartmon/devices/*/*/num": {}
            },
            "publish": [
                {
                    "plugin_name": "file",
                    "config": {
                        "file": "/tmp/published_pysmart"
                    }
                }
            ]
        }
    }
}

See exemplary task manifest

Start task:

$ snaptel task create -t task-smart.json
Using task manifest to create task
Task created
ID: c6d095a6-733d-40cf-a986-9c82aa64b4e2
Name: Task-c6d095a6-733d-40cf-a986-9c82aa64b4e2
State: Running

See the pysmart plugin task

$ snaptel task list
ID 					 NAME 						 STATE 		 HIT 	 MISS 	 FAIL 	 CREATED 		 LAST FAILURE
c6d095a6-733d-40cf-a986-9c82aa64b4e2 	 Task-c6d095a6-733d-40cf-a986-9c82aa64b4e2 	 Running 	 9 	 0 	 0 	 10:39AM 2-23-2017

Watch the collection of the metrics

$ snaptel task watch c6d095a6-733d-40cf-a986-9c82aa64b4e2

See std output stream as the metrics are collected

|intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power-Off_Retract_Count|threshold 	 000 	 2017-02-23 10:45:44.698632001 -0800 PST
|intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power-Off_Retract_Count|value 	 099 	 2017-02-23 10:45:44.698632001 -0800 PST
|intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power_Cycle_Count|threshold 		 000 	 2017-02-23 10:45:44.698632001 -0800 PST
|intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power_Cycle_Count|value 		 094 	 2017-02-23 10:45:44.698632001 -0800 PST
|intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power_On_Hours|threshold 		 000 	 2017-02-23 10:45:44.698632001 -0800 PST
|intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power_On_Hours|value 			 099 	 2017-02-23 10:45:44.698632001 -0800 PST

Stop task:

$ snaptel task stop c6d095a6-733d-40cf-a986-9c82aa64b4e2
Task stopped:
ID: c6d095a6-733d-40cf-a986-9c82aa64b4e2

Roadmap

There isn't a current roadmap for this plugin, but it is in active development. As we launch this plugin, we do not have any outstanding requirements for the next release. If you have a feature request, please add it as an issue and/or submit a pull request.

Community Support

This repository is one of many plugins in Snap, a powerful telemetry framework. See the full project at http://github.com/intelsdi-x/snap To reach out to other users, head to the main framework

Contributing

We love contributions!

There's more than one way to give back, from examples to blogs to code updates. See our recommended process in CONTRIBUTING.md.

License

Snap, along with this plugin, is an Open Source software released under the Apache 2.0 License.

Acknowledgements

And thank you! Your contribution, through code and participation, is incredibly important to us.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 94.0%
  • Shell 5.2%
  • Makefile 0.8%