What I Learned This Week: Progress Sensitive Security, Erasure

WILTW #2

Let’s keep this going! The goal of this is again, to write down some of my thoughts and what I did. This is not necessarily going to be accurate, but maybe it will be useful! Contact me if you find mistakes :).

The past week I have managed to get through another chunk of papers on the following topics:

Removing information leaks through the progress covert channel
Ensuring data erasure in programs which use untrusted data stores

Additionally, I have started to research real number libraries in Coq for use in a project…

Attacker Knowledge

Each of the papers I read deal with the notion of attacker knowledge. In these papers attacker knowledge is defined as the set of all possible initial memories that the attacker can determine based on the observed output of the program, knowing the program text. So essentially if you have something like this, where \(x, y \in \{0, 1\}\), and let’s say that these are actually read off of the disk so that an attacker does not know what they are based on the text of the program:

x = 1; y = 0;
output(x);

Then an attacker observing the output would no longer have any possible value for x in its set of knowledge. The attacker would be able to determine that x was initialized to 1. So the attacker’s knowledge is something like:

\[\{x = 1 \wedge y = 0, x = 1 \wedge y = 1\}\]

The attacker knows exactly what x is, but is unable to narrow down y in the above case. However, consider something like:

x = 1; y = 0;
output(x || y);

In this case the attacker would be able to determine a knowledge set which looks something like this, if they observed an output of 1:

\[\{x = 1 \wedge y = 0, x = 1 \wedge y = 1, x = 0 \wedge y = 1\}\]

The attacker does not know if x = 0 or x = 1 in this case, since there are ways for the output to be 1 regardless (if y = 1 then x can be 0). However, the attacker would know that both x and y can not be 0, since that would lead to an output of 0 instead.

This is a fairly intuitive definition of attacker knowledge, since it classifies all possible things which the attacker is able to determine, but it does seem a bit strange at first glance since the smaller the attacker knowledge set is, the more information the attacker knows. A smaller set means that the attacker is able to remove more possibilities that it knows are not true, giving it more specific information.

Progress-Sensitive Security

This week I went through the Precise Enforcement of Progress-Sensitive Security paper. I think this nicely compliments the paper that I went through last week on timing-sensitive garbage collection, where I expressed some concerns over how one might be able to leak information when at blocks crash.

The basic idea is to use a termination checker (in this paper, a runtime termination checker) to see if you can determine whether or not a loop will terminate or diverge based only on low security information. If this can be determined just with low security information, then any attacker at the low security level will not gain any information about high data based on whether or not the loop terminates. If this can’t be determined, then you just terminate the program. It seems like this could be slightly extended, since there may be situations where you can determine that the divergence / termination of the loop only depends upon low information, but when encountering the loop at runtime the termination checker can’t tell which it would be. This might not matter much at all, though, since this check is only needed for loops which handle high security data anyway. A loop which already deals only with low data is already going to have a low security level.

This approach could obviously be applied directly to the timing sensitive garbage collector I read about last week, as you could use a termination checker on the at command in theory.

Erasure

Additionally I went through this paper: Cryptographic Enforcement of Language-Based Information Erasure.

The gist of this paper was using techniques for ensuring information erasure, i.e., that data is erased after a certain point of execution such that an attacker is unable to collect that information if it observes the program after the data should have been erased. The paper combined this with cryptographic techniques for erasing data from an untrusted data store. Motivating example is a Snapchat-like application, which should delete messages after a certain point in time.

The paper talked about applications using untrusted data stores, where you can’t necessarily guarantee that the data is removed. A common example of this is something like cloud storage. In theory if you’re putting a bunch of information in Amazon or Dropbox, they probably keep backups of that data. Maybe when you delete the data they retain it anyway, maybe the data is recoverable by the next person to use that portion of the disk… You don’t know. Obviously the solution is to encrypt the data sent to this data store prior, in which case it’s essentially just a collection of garbage random bits. What I had not thought about before was that your own personal filesystems on your drives might be considered untrusted as well. One of the motivating examples was that a bug in the android filesystem left these snapchats available for later retrieval. Modern disks can also duplicate data and move it around unknown to you. However, by encrypting the information prior to storing it to disk you can ensure that everything on that disk is essentially useless to anybody without the private key to decrypt that data. In a sense it’s already deleted. Then you only need to ensure that the private key is thoroughly and securely deleted, and not gigs and gigs of data.

An interesting part of this paper was the classification of “equivalence up to formal randomness”. If an attacker is able to determine when cyphertexts are distinct, but not the contents, then it’s very easy to leak information unintentionally.

For instance:

u = encrypt(pk, 0);
output(u);
if secret then v = u; else v = encrypt(pk, 0);
output(v);

In this case the attacker could not determine the value of u or v, but they could determine the value of secret by checking whether or not the outputs are the same. If secret is false then the second encryption will use different random values to encrypt the value 0 than the first, so the ciphertext will appear different. The possible output sequences are said to not be equivalent up to formal randomness, since the attacker can tell that one looks like uu and the other looks like uv where v is distinct from u.

u = encrypt(pk, 0);
v = encrypt(pk, 0);
if secret then {output(u); output(v);} else {output(v); output(u);}

In this case we say the possible output sequences uv and vu are equivalent up to formal randomness. Since u and v are just random garbage the attacker can’t determine the values of them, only that they’re different. So it’s possible to think of a permutation of these random values as being equivalent. If you remap u to v and v to u then you get the other possible output sequence. There’s no way for the attacker to tell which branch was actually taken by looking at the outputs.

Coq and Real Numbers

There seems to be multiple real number libraries for Coq. I have been directed at a few possibilities.

My understanding is that C-CoRN uses the math classes library behind the scenes. However, it seems math-classes does not actually have real numbers defined in it, so that’s a bust.

I have heard mixed things about the built in real number library. It obviously introduces a bunch of axioms for real numbers, and some people have groaned about it. My initial dive into this has been a bit odd. For instance the reals library seems to be a bit oddly spotty. No functions for logarithms, but hey limits are defined on arbitrary metric spaces! Can the limits diverge? Doesn’t seem like that’s possible… Which is kind of unfortunate. Can you have a limit that tends towards infinity? Nope.

It sounds like C-CoRN / the math classes library have things set up in a fairly principled way, and you should be able to substitute different real number implementations, so this is probably the way to go…

The main concern I have is that C-CoRN seems to still need Coq 8.5.2, and we’re on Coq 8.7. The math classes library works on 8.7, but as previously mentioned, it lacks real numbers, making this a bit of a moot point.

Productivity

Orgzly has proven to be useful to have on my phone, so that I can see all of my org-mode stuff on the go, and possibly make small changes.

A number of things have been very annoying with it, but it’s still a net win! What I find could be improved at the moment:

No built in git sync.
- Synchronizing with git manually on a phone is so painful!
Checkboxes don’t work, though they will in the next major version!

I have attempted to set up beeminder.el to have org-mode automatically able to submit data for my beeminder goals. I find a couple of details don’t work well for me — it sets the deadline of the task to when you would derail from the goal in Beeminder. This is a good feature, but often I like having deadlines separate from that! Additionally, if I want to use Orgzly on my phone to submit tasks, then it won’t update Beeminder… So, it’s probably best to just manually record a lot of data for now.

In typical agile buzzwords fashion I have started doing a biweekly review of the tasks left in my org files. Currently I still have a bunch of tasks which I have no idea how to sort out off the top of my head. However, I have begun to organize them using org’s handy M-<up> and M-<down>, which lets me reorder headings very quickly.

The increased scheduling that I have been doing has been very helpful as well! I have been slowly going through the Slack logs left over from the DeepSpec Summer School in 2017 to make a resource of all of the great questions and answers that have come out of that. Previously progress had been more or less halted, but now that I have a scheduled time to go through it every week it’s getting done! Soon we’ll have another resource of a bunch of answers to beginner questions for Coq!

I have also managed to get a bit ahead on some of my Beeminder goals. Previously I have been skirting the edge with these goals, but I am finding that Beeminder works much better with the extra buffer. For those interested in Beeminder, this is an important piece of advice. Getting ahead is significantly less stressful and can grant you much better flexibility if you’re on top of things, which ultimately helps you be even more productive, I think. You can take an hour off of a task one day and then make up that time over a longer period of time, which can be quite helpful. Falling behind with a day buffer can be incredibly stressful, being able to amortize this is immensely valuable.