Glen on Programming: 2012

Wednesday, December 12, 2012

Analog Random Character Generator

I've been giving away little decks of cards with the following write-up since 2007. I figured it was time to make them publicly and freely available for non-commercial use. This presentation as well as a Microsoft Word document that will print on Avery 5877 business card stock are checked into my project on GitHub.

Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.
- John von Neumann 1951

Computers are deterministic state machines - totally incapable of producing anything random. They employ pseudorandom number generators: functions that appear to produce random output with respect to their input. Good pseudorandom number generators (PRNGs) can produce thousands of values a second with a surprising degree of entropy. But even the best PRNG is limited by the variability of its “seed” or input values, and none have perfectly even distribution or an absence of patterns in their output.

Cards are much slower than PRNGs, but a deck can be shuffled in such a way that no-one can predict the order. This deck contains 52 letters (26 uppercase and 26 lower), 10 numbers (0-9), and 33 other symbols for a total of 95 cards. It was designed to create truly random passwords, keys, and other security tokens for use with computers.

Instructions

Always shuffle well before dealing a new key to ensure a highly random deck. Shuffling between each draw will yield the most random results, but a very high degree of entropy can be obtained by shuffling after every 5 to 9 characters. Vary the number of characters between shuffles to avoid dividing your key into n-character blocks and providing a (small) foothold for attackers. It’s a good idea to shuffle at least once mid-key to allow the possibility of a repeated character. If you occasionally deal yourself a real word, place, or person, throw it out and start over. When you are done, shuffle your deck so that you don’t leave the top or bottom cards in the order of your password.

Remember to keep your keys electronically encrypted and/or physically secured. Remember that the 32 symbols in this deck can cause problems in certain contexts (the * at the command prompt, for instance) so make sure to encode or discard any characters that would be troublesome for your application.

Bits of Entropy

In cryptography, key size, measured in bits (or powers of 2), represents the number of keys which must be tested to break an encrypted message by brute force. But encryption algorithms are generally “lossy” because, being deterministic, they have some degree of predictability and/or collisions of different keys producing the same output values. Some algorithms are “lossier” than others and there are many statements in the literature to the effect that a 1024-bit key using one algorithm is just as secure as an 80-bit key using another. That means one algorithm is 1.49×10284 times more secure than the other! Despite common misuse in advertising claims, bits are the “standard” unit of entropy. In this writing, they represent a mathematically derived number of equally likely possible combinations, expressed in powers of 2 - not the computer space used to store a value generated with a PRNG.

Using the 95 standard ASCII characters printed on these cards, there are 10148 possible deck orderings (that’s 492 bits of precision). By shuffling every few characters, any desired level of entropy can be achieved. There are 79 bits of entropy in a 12-character key (that’s 5.40×1023 possible keys) which is about 33,000 times the precision of the most theoretically random 64-bit value a computer could possibly generate. Even a 5-character key has 7.73×109 possible values, which is almost twice the precision of a 32-bit number.

Dirty Secrets of PRNGs

"We guarantee that each number is random individually, but we don't guarantee that more than one of them is random." Figure that out.
- Press, William H., et al. 1992 referring to RANDU

Given the same initial input, a PRNG will not only produce the same output value, but will produce the same sequence of values every time, and that sequence is generally your cryptographic key, password, or hash. The “moving parts” inside most PRNGs are 64-bit and 32-bit registers and this limits their entropy. But PRNGs are often more limited by the precision of their input or seed. Since the few somewhat random processes in the computer tend to produce clusters of values, they are unsuitable as random number seeds. The only part of the computer that produces an even dispersion of values is the system clock, which measures milliseconds.

There are only 8.64×107 milliseconds in a day. That’s about 26 bits of entropy. Without shuffling, you can deal yourself a stronger key in only 5 cards! So why not use a longer time frame? There are 3.1×1011 milliseconds in 10 years which is only 38 bits of entropy – and who has a password older than 10 years? Most people generate their keys at work, between the hours of 9-5 M-F which is only ¼ of the hours in the week, so you might be loosing about 2 bits of entropy there.

So even if you have a 64-bit random number generator, unless you can get a seed with more than 36 bits of entropy, you only get 36 bits of entropy in your key. That’s the same number of significant digits in a truly random 6-character key (956). How many significant digits of randomness does any computer generated encryption key contain? No more than the significant digits of entropy used to create it times the number of PRNG algorithms that could have been used. In 2008, that often means the time and the process ID (a number generally between 1 and 32,767) combined which yields 9.9×1015 likely combinations or 54 bits of entropy. If process IDs were truly random (they are not), you can deal yourself more entropy in 9 cards than state-of-the art automatic password generators on consumer operating systems can generate.

PRNGs that accept mouse or keyboard input, hardware PRNGs, and entropy-gathering daemons are significant improvements, but without an exhaustive analysis, it is difficult to judge their true effectiveness. Playing cards have been reliably generating entropy since they were invented in 12th century China.

Analysis: Shuffling Every Card

Random numbers should not be generated with a method chosen at random.
- Donald E. Knuth

Assuming you shuffle between each card, the table below shows the number of possible combinations for each length of key. The column headings have the following meanings:

n: length of key
95n: Number of possible keys of the given length (n). Repeated characters are allowed.
log2(95n): This is an approximate value corresponding to the nearest number of bits of entropy.
Crack: the time a 3Ghz processor would take to brute-force generate half the keys of this length. Most encryption algorithms would slow this down significantly, but serious crackers would use much more horsepower. YMMV.
Relevant Numbers: Real world examples with the same magnitude as the number of possible keys.

n	95n	log2(95n)	Crack	Relevant numbers
1	95	7	0
2	9025	13	5 ms.
3	857,375	20	½ sec.
4	81,450,625	26	15 sec.	Number of milliseconds in a day.
5	7.73×109	33	30 min	Number of milliseconds in 4 months.
6	7.35×1011	39	2 days	Trillions (125 Gigabytes). Number of milliseconds in 20 years.
7	6.98×1013	46	1 year
8	6.63×1015	53	70 years	256 The original key length for DES encryption.
9	6.30×1017	59	7,000 years	Quintillions. The age of the universe in seconds.
10	5.98×1019	66	600,000 years
11	5.68×1021	72		Number of stars in the universe
12	5.40×1023	79		Septillions. Avogadro’s number – the number of atoms in 12 grams of carbon-12
13	5.13×1025	85
14	4.87×1027	92
15	4.63×1029	99
16	4.40×1031	105
17	4.18×1033	112
18	3.97×1035	118
19	3.77×1037	125
20	3.58×1039	131

Analysis: Without Shuffling

If you deal a key without shuffling, the number of possible next-characters decreases every time one is dealt. The formula for possible combinations of each card dealt is: 95 × (95 - 1) × (95 - 2) × ... × (95 - (n - 1)) where n is the length of the key (number of cards dealt). The table below is meant for comparison against the table above for the purpose of deciding how often you want to shuffle when you deal a key. The formula column just shows how the Number of Combinations was derived, since it’s a little different for each card dealt. A mathematician might write the formula using capital pi notation to represent the product of a sequence, in this case the sequence of numbers, from 95 down to 95 - (n - 1) multiplied by each other:

Or, using permutation notation:

n	# combinations	log2(95n)	Formula
1	95	7	95
2	8,930	13	95×94
3	830,490	20	95×94×93
4	76,405,080	26	95×94×93×92
5	6.95×109	33	95×94×93×92×91
6	6.26×1011	39	95×94×93×92×91×90
7	5.57×1013	46	95×94×93×92×91×90×89
8	4.90×1015	52	95×94×93×92×91×90×89×88
9	4.26×1017	59	95×94×93×92×91×90×89×88×87
10	3.67×1019	65	95×94×93×92×91×90×89×88×87×86
11	3.12×1021	71	95×94×93×92×91×90×89×88×87×86×85
12	2.62×1023	78	95×94×93×92×91×90×89×88×87×86×85×84
13	2.17×1025	84	95×94×93×92×91×90×89×88×87×86×85×84×83
14	1.78×1027	91	95×94×93×92×91×90×89×88×87×86×85×84×83×82
15	1.44×1029	97	95×94×93×92×91×90×89×88×87×86×85×84×83×82×81
16	1.15×1031	103	95×94×93×92×91×90×89×88×87×86×85×84×83×82...×80
17	9.12×1032	109	95×94×93×92×91×90×89×88×87×86×85×84×83×82...×79
18	7.12×1034	116	95×94×93×92×91×90×89×88×87×86×85×84×83×82...×78
19	5.48×1036	122	95×94×93×92×91×90×89×88×87×86×85×84×83×82...×77
20	4.16×1038	128	95×94×93×92×91×90×89×88×87×86×85×84×83×82...×76

The benefit of adding characters without shuffling drops off dramatically as the number of characters nears 94. For example:

By the 9th character, you loose one bit of precision (half as many combinations without shuffling as you would with shuffling). This should not make a difference for most applications.
By the 15th character you have about ¼ as many combinations (2 bits). But ¼ of a really big number is still a really big number.
By the 18th character, 3 bits of precision are lost (1/8 the combinations).
By the 32nd character, at least one more bit of precision is lost for each additional character (there are only 64 or 26 characters left providing 6 bits of entropy instead of 7)
The 80th character provides only 4 bits of entropy (there are 16 characters left unaccounted for)
The 94th character provides only 1 bit of entropy (only 2 possibilities)
The 95th character provides no additional entropy (The only card left is totally determined by the 94 that came before it).

Monday, September 17, 2012

Java Closures and The Start-End Problem

I use the term, "Start-End Problem" to describe a resource that needs to be opened and closed, or a header and footer that needs to be printed in a way that would tempt you to make a class with start() and end() methods, that are meant to be called as a pair with other code in-between. Some time around 1997, my mentor Jack Follansbee told me to avoid this pattern whenever practical, because it's too easy to forget to call the end() method. Another problem would be if the code in-between performs a return, continue, break, throws an Exception, or otherwise avoids reaching the end() method.

Java's try-catch-finally block solves the return and Exception problems, but does not ensure that you will remember to call the end() method in the finally block. There are three traditional Java approaches to this problem:

Pre-evaluating the middle code somehow (e.g. with a buffer) and passing it to a procedure which appends to the Start and End of the buffer.
```
public void doStartEnd(StringBuilder sB) {
    sB.insert(0, "Start");
    sB.append("End");
}
```
This has limited applications, but is nice when it works.
Dependency Injection: Creating a special-purpose procedure with variables to adapt it to multiple related purposes. If the number of variables is small (or 0) this works very well. But sometimes multiple variables need to be passed to this procedure and even a complex data structure must be created to hold the updated values for the return type. It seems a waste of time and energy to load all the variables into the procedure arguments like passengers on a bus then unloaded their updated values from the return data-type one by one.
```
public class ReturnObject {
    public int count;
    public boolean showedAnything;
}
public ReturnObject doStartEnd(int count, String middle, boolean showedAnything) {
    System.out.println("Start");
    if (!showedAnything) { System.out.println("New stuff"); }
    for (; count < 5; count++) {
       showedAnything = true;
       System.out.println(middle);
    }
    System.out.println("End");
    ReturnObject ro = new ReturnObject();
    ro.count = count;
    ro.showedAnything = showedAnything;
    return ro;
}
```
Real world examples can get complicated very quickly. Someone on StackExchange: Programmers recently said they routinely found procedures with 10 or 20 parameters where they worked and that they "died a little bit inside" every time they found one. There are numerous ways to cause bugs with this approach: transposing values, confusing Java's pass-by-value for default types with pass-by-reference... just to name a few. The coding effort involved can be impressive.
Create an interface for the middle code being executed and have an object implement that interface. A well designed object, possibly with a Builder pattern can mitigate some of the Bus Station issues, but since this pattern is just the Dependency Injection patter turned inside-out it sometimes just pushes the dependency injection overhead of the above solution from a procedure to an object. This used to be the most general solution to the start-end problem. The first example below uses an abstract class and an anonymous implementation but that is just a minor variation on this technique.

A functional programmer would say that a lexical closure would be the obvious solution to this problem. It allows an arbitrary amount of code (including the caller's local variables) to be executed in the middle of some other code. No pre-evaluation, no Bus Station, no extra objects or interfaces. The outer "enclosing" code can do the start() and end() logic with the "enclosed" code in a little magic closure envelope in the middle. Java doesn't have closures or function pointers, but an anonymous inner class looks a little like a closure and the following code compiles, works, and even solves the Start-End problem for some special cases, though it might not win many beauty contests:

public class ClosureTest {
    private static abstract class StartEnd {
        public void doStartEnd() {
            System.out.println("Start");
            middle();
            System.out.println("End");
        }
        public abstract void middle();
    }

    public static void main(String[] args) {
        new StartEnd() {
            @Override
            public void middle() {
                System.out.println("Middle");
            }
        }.doStartEnd();
    }
}

Output:

Start
Middle
End

A sad limitation of this technique is that Java tries to prevent you from updating the main() method's local variables inside the the anonymous inner class by forcing you to make them final (immutable). The following WILL NOT COMPILE:

public static void main(String[] args) {
    int count = 0;
    new StartEnd() {
        @Override
        public void middle() {
            // ERROR: local variable count is accessed from within
            // inner class; needs to be declared final
            for (; count < 5; count++) {
                System.out.println("Middle");
            }
        }
    }.doStartEnd();
    System.out.println("Total Count " + count);
}

A mutable wrapper class will work around this restriction. Uglier, but it works:

private static class MutableRef<T> {
    public T count;
}
public static void main(String[] args) {
    final MutableRef<Integer> mr = new MutableRef<Integer>();
    mr.count = 0;
    new StartEnd() {
        @Override
        public void middle() {
            for (; mr.count < 5; mr.count++) {
                System.out.println("Middle");
            }
        }
    }.doStartEnd();
    System.out.println("Total Count " + mr.count);
}

Output:

Start
Middle
Middle
Middle
Middle
Middle
End
Total Count 5

Java 7's try-with-resources feature provides my favorite solution to this particular issue. Not a true general-purpose lexical closure, but sure looks like one for solving this particular problem:

public class ClosureTest {
    private static class StartEnd implements AutoCloseable {
        public StartEnd() { System.out.println("Start"); }
        @Override
        public void close() { System.out.println("End"); }
    }

    public static void main(String[] args) {
        int count = 0;
        try (StartEnd se = new StartEnd()) {
            for (; count < 5; count++) {
                System.out.println("Middle");
            }
        } // end of StartEnd
        System.out.println("Total Count " + count);
    }
}

One more detail... If you compile with -Xlint it may complain, "warning: [try] auto-closeable resource se is never referenced in body of corresponding try statement." I have found that using this pattern in my own code I usually use the se variable within the corresponding try block. But for the times that I don't, a preventCompilerWarning() method eliminates the warning:

public class ClosureTest {
    private static class StartEnd implements AutoCloseable {
        public StartEnd() { System.out.println("Start"); }
        @Override
        public void close() { System.out.println("End"); }
        public void preventCompilerWarning() { }
    }

    public static void main(String[] args) {
        int count = 0;
        try (StartEnd se = new StartEnd()) {
            se.preventCompilerWarning();
            for (; count < 5; count++) {
                System.out.println("Middle");
            }
        }
        System.out.println("Total Count " + count);
    }
}

Voilà - the Start-End problem solved! Unlike a lexical closure in a functional language, this technique executes the close() method even when a return statement is reached or an Exception is thrown. This may be good or bad depending on your situation.

Notice that the local variable count is being used "inside" the StartEnd block without being visible to that code? No dependency injection, interfaces, objects, etc. This is the essence of a lexical closure - a little bubble of extra variable scope without otherwise violating the privacy of the enclosed code. Hopefully you can imagine some of the versatile coding paradigms which the new try-with-resources block makes available in Java 7.

For a more general solution to the problem of closures in Java before Java 8, check out this post.

Thursday, February 2, 2012

Google's Privacy Policy

I've been building professional web applications for 12 years and everywhere I've worked, we built the most complete profile of our users that we possibly could. There is really no other way to operate. When there is an issue, the first question is, "How many users are affected?" You need to know as much as possible about the user in order to answer this question. Usually, that starts with their browser, operating system, and IP address.

The second question when an error occurs is, "How did this happen?" For that, you need a history of everything every user has done. Web applications are not the only ones who collect or use this information. Your operating system, browser, the domain name system, your ISP, and every hub and router that your communications pass through has to have some level of logging, auditing, or tracking in order to function correctly (this logging may normally be turned off, but it's there, or can be installed). Unless you break into someone's computer, the web has never been anonymous. Something like Tor can create some anonymity. If you are really concerned, use Tor.

Last night, someone on On Point suggested using Yahoo! mail instead of Google mail, but Yahoo!, Twitter, Amazon, Microsoft and others are trying to collect every bit as much information about you as Google is. The only reason that Google might collect more is that Google is just doing a better job of capturing everything.

I think what really upsets people is that Google has been so successful at using this information to target ads for people. Without the targeted ads, the information collection is not visible to the end-user. All of a sudden, it seems like Google and Facebook are doing this terrible thing, when really, everyone is doing it and you only see the results when you use those products.

Having paid to advertise using Google Adwords, I can say without a doubt that when Google shows you an ad for Type II diabetes, they are not telling the advertisers that you have Type II diabetes. The advertiser does not know anything about who you are, just that someone, somewhere used the word "Diabetes" and that Google showed your ad, and whether or not someone clicked on it. Google has a program that sees the word "Diabetes" in your profile and that fires ads that also have the word "Diabetes" in them. They are not disclosing anything about you to advertisers.

I joined Audubon last year, and within 2 weeks, I had requests for donations in my mailbox from The Sierra Club, The Arbor Day foundation, and half a dozen other environmental organizations. There is something very ironic about receiving a pulped, dead tree (paper) in the mail from the Arbor Day foundation. I felt mildly violated, knowing that my gift to Audubon resulted in them giving my address to a bunch of other organizations, who I then had to call up and ask to be removed from their mailing lists. Where is the outrage against these organizations? Why the outrage against Google and Facebook, who have never given my street address or email address to anyone?

If you share your Google login with someone else, or use your computer where someone can see your screen, how is Google supposed to protect you from that? Tell other family members to get their own profile. If you don't want to see adds for divorce lawyers, don't discuss divorce with friends in email - use the phone or talk in person. People don't read porn in public. Why should they expect to surf web sites in public that they don't want other people to know about?

I cannot imagine any meaningful way to make the web truly anonymous that does not also give hackers and spammers a huge edge over legitimate web sites and e-commerce. The profiling and tracking that every major web presence uses today is necessary to prevent spam from dominating our inboxes and forums, and also to catch hackers when they break in and steal or destroy things. True anonymity on the web, were it possible, might make e-commerce impossible and email unusable due to unstoppable vandalism, theft, and spam.

Most of the time, targeted ads are a good thing. If you have a nun and a gigolo, doesn't it make sense to show an ads for habits to the nun and condoms to the gigolo? Do you really want the reverse of that happening? If I'm going to have to look at ads, I'd rather look at ones relevant to me than irrelevant ones.

The targeted ads that are upsetting people so much are what pay for Gmail, YouTube, Facebook, and most of the sites people frequent on the web. If you don't want Google ads, you can pay them $50/year for Google Apps. For free, you get targeted ads. Live with it.

Monday, January 9, 2012

Ubuntu 11.10 "Oneiric" with a Gnome 2 Feel

I love, and am maybe even a little addicted to the packages Ubuntu includes in their distributions. But like many, I am dissatisfied with the new Unity interface because:

I can't open two copies of KeePassX at once
I hate using text search to find applications
It was too hard to create a custom application launcher

Hopefully all that will change, but until then, it's just not working for me.

I tried the LXDE desktop, but:

I didn't like the default software (Leafpad text editor, Galculator, LX terminal)
Desktop integration is lacking (no screen-shots, limited drag and drop from one application to another)

It's a very new distribution and may need a year or two to catch up. Certainly LXDE is worth keeping an eye on, but it's not there yet.

What's currently working for me is a "retro" basic Gnome session based on OMG Ubuntu's article, Make Ubuntu 11.10 Look and Feel Like GNOME 2. Thanks to "DigalMan" for pointing it out to me. That article has a typo though, it's gnome-session-fallback, not gnome-fallback-session.

Also, when they say, "Add ‘ppa:jconti/gnome3‘ to your Software Sources" their instructions don't include how to add the PPA's key, so that future updates are not be applied properly (or at all). Fortunately, launchpad.net has instructions. The key lines I needed were:

sudo add-apt-repository http://ppa.launchpad.net/jconti/gnome3/ubuntu
sudo apt-get update

It's not perfect (the top-bar color's not quite right and the icons space themselves wider apart when hovered over), but I'm basically back with the Ubuntu system I have loved for years, with the latest versions of all software. If this works in 12.04, I'll probably stick with it. If not, and LXDE has not improved by that time, I'll guess I'll try Mint.