Earthquake is a programmable fuzzy scheduler for testing real implementations of distributed system (such as ZooKeeper).
Blog: http://osrg.github.io/earthquake/
Earthquakes permutes C/Java function calls, Ethernet packets, Filesystem events, and injected faults in various orders so as to find implementation-level bugs of the distributed system. When Earthquake finds a bug, Earthquake automatically records the event history and helps you to analyze which permutation of events triggers the bug. Earthquake also collects branch patterns for deeper analysis.
Basically, Earthquake permutes events in a random order, but you can write your own state exploration policy (in Golang) for finding deep bugs efficiently.
- ZooKeeper:
- Found ZOOKEEPER-2212 (race): blog article (repro code)
- Reproduced ZOOKEEPER-2080 (race): blog article (repro code)
- etcd:
- Found an etcd command line client (etcdctl) bug #3517 (timing specification), fixed in #3530: (repro code). The fix also resulted a hint of #3611.
- YARN:
- Found YARN-4301 (fault tolerance): (repro code)
The following instruction shows how you can start Earthquake Container, the simplified CLI for Earthquake. (For full-stack Earthquake environment, please refer to doc/how-to-setup-env-full.md.)
$ sudo apt-get install libzmq3-dev libnetfilter-queue-dev
$ go get github.com/osrg/earthquake/earthquake-container
$ sudo earthquake-container run -it --rm --eq-config config.toml ubuntu bash
A typical configuration file (config.toml
) is as follows:
explorePolicy = "random"
[explorePolicyParam]
minInterval = "80ms"
maxInterval = "3000ms"
In Earthquake Container, you can run arbitrary command that might be flaky. JUnit tests are interesting to try.
earthquake-container$ git clone something
earthquake-container$ cd something
earthquake-container$ for f in $(seq 1 1000);do mvn test; done
- Earthquake was presented at the poster session of ACM Symposium on Cloud Computing (SoCC). (August 27-29, 2015, Hawaii)
We welcome your contribution to Earthquake. Please feel free to send your pull requests on github!
Copyright (C) 2015 Nippon Telegraph and Telephone Corporation.
Released under Apache License 2.0.
// implements earthquake/explorepolicy/ExplorePolicy interface
type MyPolicy struct {
actionCh chan Action
}
func (p *MyPolicy) GetNextActionChan() chan Action {
return p.actionCh
}
func (p *MyPolicy) QueueNextEvent(event Event) {
// Possible events:
// - JavaFunctionEvent (byteman)
// - PacketEvent (Netfilter, Openflow)
// - FilesystemEvent (FUSE)
// - LogEvent (syslog)
fmt.Printf("Event: %s\n", event)
// You can also inject fault actions
// - PacketFaultAction
// - FilesystemFaultAction
// - ShellAction
action, err := event.DefaultAction()
if err != nil {
panic(err)
}
// send in a goroutine so as to make the function non-blocking.
// (Note that earthquake/util/queue/TimeBoundedQueue provides
// better semantics and determinism, this is just an example.)
go func() {
fmt.Printf("Action ready: %s\n", action)
p.actionCh <- action
fmt.Printf("Action passed: %s\n", action)
}()
}
func NewMyPolicy() ExplorePolicy {
return &MyPolicy{actionCh: make(chan Action)}
}
func main(){
RegisterPolicy("mypolicy", NewMyPolicy)
os.Exit(CLIMain(os.Args))
}
Please refer to example/template for further information.