27 December 2019 update

Updated demo package working with current angr: unrolldemo-20191227.tar.gz

19 March 2019 initial posting

Did a quick demo on 19 March 2019 of using angr to unroll various code examples: unrolldemo-20190319.tar.gz

General usage in each directory: make, then ./unroll. Sometimes ./unroll2 or ./unroll3 is better. The 10-ntt directory also supports make test for a random test of the unrolled output against the original code.

Prepared demo at the last minute, so no quality guarantees. Patched angr enough to make the 5-unpack and 6-pshufb examples work, but the shuffle patch should be generalized to handle symbolic indices. The 7-broadcast32 example was intentionally non-functional to show what unsupported instructions look like. Started a range analyzer for 10-ntt but did only parsing so far.

20 March 2019 update

Added a demo 10-ntt-20190320.tar.gz showing full functional verification (including range verification and algebraic verification) for a fast asm NTT from https://github.com/pq-crystals/kyber/blob/master/avx2/ntt.s. To run everything in the directory, simply make (after installing angr; see installation instructions below). The most interesting output files are

all of which are also included in the tarball for easier reference.

Tweaks to the NTT approach will sometimes require extra code in the verifier, but a verifier that's only slightly larger than this should be able to handle a much larger corpus of platform-optimized NTT implementations. The point is that this reduces the large problem of eliminating bugs in all those implementations to the much smaller problem of eliminating bugs in the verifier.

Notes on formal verification

What this verifier is meant to be doing is called "equivalence checking", specifically checking that a short NTT program (expntt) always produces the same outputs modulo q as the asm implementation. The verifier is meant to be "sound": if it is sound and says that two programs are equivalent then the programs are equivalent. The verifier is not meant to be, and is not, "complete": it can fail to detect the equivalence of equivalent programs. However, the verifier is sufficiently complete to be able to handle this interesting example of NTT code, and (I claim) many more examples.

Formalizing equivalence requires specifying the languages that the supposedly equivalent programs use, or at least large enough subsets of the languages to handle the programs and the relevant input-output behavior. Formalizing soundness also requires specifying the language that the verifier uses. Language specifications typically have errors (compared to what was intended), but one can imagine a future in which (1) the errors are fixed, (2) bugs in the language implementations are fixed, and (3) the language implementations are verified to match the specifications, assuming properties of the underlying hardware. (In the meantime, perhaps discrepancies between the language specifications and the language implementations do not affect the correctness of the verifier.)

Proving soundness is obviously impossible if the verifier is not sound, i.e., if the verifier has bugs. The verifier isn't doing anything particularly tricky, so there are no obvious obstacles to a proof once any bugs are fixed; but saying this is much easier than doing it, especially if one requires the proof to be checked by a computer. Proofs that aren't checked by a computer often have errors. Also, computerized proof checkers have had bugs allowing bogus proofs. But one can imagine a future in which proof checkers are themselves sound. (In the meantime, perhaps bugs in the proof checkers do not affect any proofs that we actually care about.)

From a security perspective, each layer here is creating another way to let the computing base scale while the trusted computing base stays small. For example, instead of asking security reviewers to verify a huge pile of asm implementations, we can have an equivalence checker verify all the implementations, and merely ask security reviewers to verify the equivalence checker. Even if the equivalence checker is domain-specific (e.g., an equivalence checker only for linear functions modulo small constants), this can be a huge win even for a small number of asm implementations (depending on how simple the checker is), and each new asm implementation makes the equivalence checker even more attractive. As another example, instead of asking security reviewers to verify many equivalence checkers, we can have a proof checker check proofs of correctness of the equivalence checkers, and merely ask security reviewers to verify the proof checker.

Installing angr

1. Create a virtual machine (I used xen-create-image) running the latest release ("unstable") of Debian.

2. Within the virtual machine, as root, install the following libraries and development packages:

   dpkg --add-architecture i386
   apt update
   apt dist-upgrade -y
   apt install \
     man wget strace psmisc screen vim time git tig \
     build-essential clang gdb valgrind nasm cmake \
     libffi-dev libssl-dev libxml2-dev libxslt1-dev \
     libreadline-dev libpixman-1-dev libqt4-dev \
     qtdeclarative5-dev openjdk-8-jdk \
     libglib2.0-dev python-dev python2.7-dev pypy \
     python3 python-pip virtualenvwrapper pypy-dev \
     python-pyparsing python3-pyparsing \
     python3-pip \
     libtool debootstrap debian-archive-keyring \
     binutils-multiarch libc6:i386 libgcc1:i386 \
     libstdc++6:i386 libtinfo5:i386 zlib1g:i386 \
     multiarch-support gcc-multilib \
     -y

3. Within the virtual machine, as root, create a user for experiments with angr.

4. Within the virtual machine, as the new user, install the latest version of angr from source (takes about 10 minutes):

   cd
   git clone https://github.com/angr/angr-dev.git
   cd angr-dev
   mkvirtualenv -p /usr/bin/pypy angr
   workon angr
   setsid -w sh -c ./setup.sh < /dev/null