Skip to content

cmla-psu/checkdp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CheckDP

An automated and integrated tool for proving and disproving Differential Privacy using Programming Language techniques.

Install via Docker

Due to the multiple dependencies of this project, it is much easier to install inside a docker container.

In the project root directory, run:

docker build . -t checkdp
docker run --rm -it checkdp

You will be inside a docker container with everything installed.

Specifically, KLEE is installed at /checkdp/klee, with its own dependencies (miniSAT, stp, llvm-9.0) in /checkdp/klee/deps. Counterexample validator PSI is located at /checkdp/psi.

Usage

Run python3 -m checkdp -h you will see the following help message:

usage: __main__.py [-h] [-k KLEE] [-i INCLUDE] [-c CLANG] [-o OUT]
                   [-l LOGLEVEL] [-p PSI] [-s PSI_SOURCE] [-d]
                   [--transform-only] [--enable-shadow]
                   FILE

positional arguments:
  FILE

optional arguments:
  -h, --help            show this help message and exit
  -k KLEE, --klee KLEE  The klee binary path
  -i INCLUDE, --include INCLUDE
                        The include path for klee
  -c CLANG, --clang CLANG
                        The clang binary path
  -o OUT, --out OUT     The output path for CheckDP
  -l LOGLEVEL, --loglevel LOGLEVEL
                        The log level for the logger, could be one of {debug,
                        info, warning, error}
  -p PSI, --psi PSI     The path for psi binary
  -s PSI_SOURCE, --psisource PSI_SOURCE
                        The source file for psi
  -d, --deterministic   Specifies whether to search deterministically (for
                        reproducibility) or not.
  --transform-only      Only generate the transformed template
  --enable-shadow       Controls whether shadow execution is used or not.

You don't need to specify -k / -i / -c / -p parameters if you're using docker since the default values would be set.

Run python3 -m checkdp [FILE] to start analyzing [FILE], the tool will generate a checkdp-out folder containing all intermediate files:

  • generate-[alignment|input]-[iteration].c - The files sent to KLEE to find concrete values for symbolic [alignment | input], where [iteration] marks the current number of iterations. For better clarity, iteration number will be added 1000 for each restart.

  • generate-[alignment|input]-[iteration].bc - The bytecode file generated by compiling the source file using CLANG (specified by -c, default to bundled clang in container) to be later sent to KLEE.

  • run.log - the log file containing all DEBUG output.

We have provided all examples studied in our paper in examples/ folder and a benchmark script. Simply run

python3 scripts/benchmark.py

to run checkdp on all of them at once, the corresponding results files will be stored in results/<algorithm_name>.

Counterexample Validation

Additionally, you can specify -s PSI_SOURCE for counterexample validation if a counterexample is found by CheckDP.

Refer to psi repository for more details on how to write PSI programs. You may also take a look at PSI programs in examples/ folder. Notably, in order for CheckDP to properly insert corresponding concrete values for inputs and outputs, you should add placeholders such as $q$ and $T$ for the inputs (parameters of the algorithms), as well as $ARRAY(out)$ to mark the output array. Refer to examples/badgapsparsevector.psi for example.

Enable Shadow Execution

As discussed in the paper, shadow execution can be incorporated as an add-on and therefore controlled by the --enable-shadow flag. By default it is turned off for better efficiency since enabling it introduces more overhead. For algorithms NoisyMax / BadNoisyMax / BadNoisyMaxVal it must be turned on.

Writing Your Own Algorithm

Most of the syntax in CheckDP is the same as C (though only a subset of C is supported). Also, we have some requirements of the implemented algorithms:

  1. The first three parameters of the function must be query, size, epsilon with strict order.

  2. The first three lines at the beginning of the function must be C-strings marking the function specifications.

  3. No return commands. Instead, CHECKDP_OUTPUT intrinsic should be used to mimic adding an element to the output array.

  4. No function calls except noise generation, e.g., float eta = Lap(SCALE);. Note that currently only laplace noise Lap(SCALE) is supported.

Let's look at an example sparsevector algorithm in examples/sparsevector.c.

int sparsevector(float q[], int size, float epsilon, float T, int NN)
{
  "TYPES: epsilon: <0, 0>; size: <0, 0>; q: <*, *>; T: <0, 0>; NN: <0, 0>";
  "PRECONDITION: ALL_DIFFER; ASSUME(NN > 0); ASSUME(NN <= size); ASSUME(T >= -10); ASSUME(T <= 10);";
  "CHECK: epsilon";

  int out = 0;
  float eta_1 = Lap(2 / epsilon);
  int T_bar = T + eta_1;
  int count = 0;
  int i = 0;

  while (count < NN && i < size)
  {
    float eta_2 = Lap(4 * NN / epsilon);

    if (q[i] + eta_2 >= T_bar)
    {
      CHECKDP_OUTPUT(1);
      count = count + 1;
    }
    else
    {
      CHECKDP_OUTPUT(0);
    }
    i = i + 1;
  }
}

More details about the function specification annotations:

  • "TYPES: (VARIABLE_NAME: <ALIGNED_DISTANCE, SHADOW_DISTANCE>;)+"

    This specifies the input distance types for the parameters, note that the types for all parameters must be specified otherwise an exception will be thrown. Shadow distances are ignored if --enable-shadow is not specified, however they must be given. If you do not use shadow execution, feel free to mark the shadow distances as *.

  • "PRECONDITION: (ALL_DIFFER|ONE_DIFFER); (ASSUME(EXPRESSION);)*"

    This specifies the preconditions on the parameters, where ALL_DIFFER / ONE_DIFFER is a macro for the query variable, indicating that any or only one of the query answers can differ by at most 1. Additionally, the assumptions for other parameters can be marked manually by ASSUME(expression), where expression can be arbitrary C-style expressions containing the parameters. Adding manual assumptions would reduce the search time and rule out invalid counterexamples.

  • "CHECK: EXPRESSION"

    This specifies the final goal to check. Most of the algorithms satisfies epsilon-differential privacy, however, some algorithms such as SmartSum satisfies 2*epsilon-differential privacy. In that case you should write "CHECK: 2 * epsilon".

A Note on Limitations of KLEE

One thing to notice is that due to the limitations of KLEE (only supporting integers), all float declarations would be converted to int, this may bring imprecisions. CheckDP will issue a warning whenever a conversion happens.

However, you can safely write expressions with divisions in Lap, such as Lap(4 * NN / epsilon) in SparseVector algorithm. CheckDP will properly scale the cost variable calculations to eliminate the floating-point calculations.

License

MIT.

About

Proof-of-Concept Automated Tool for Proving and Disproving Differential Privacy.

Resources

License

Stars

Watchers

Forks