Google Buzz

Google Buzz launched, and it seems to do some fun stuff, but it also broke a lot of peoples’ privacy assumptions and privacy models. That was bad. But for me, the most disappointing thing was that Google never said “oops, my bad”. Instead, every post on the official Google blogs has been in the chirpy dishonest speech of someone trying to sell us stuff.

Google! You made your reputation on being funky and honest and nerdy! This chirpy too-smooth marketroid-speak does not become you! It makes everyone trust you less, not more. The truth is, that for many people, Buzz shipped broken. It violated their privacy models and exposed their private data to the world without their consent. And you never ever said “oops” or “my bad” or anything. I guess it’s foolish to expect honest and forthright communication signals from a large company, but it’s sad to see it happen to you, big G.

Computer Engineer Barbie

In the words of Tracy: “Oh Internet, is there anything you can’t do?” Thanks to an Internet vote, coming this December you will be able to buy Computer Engineer Barbie! I generally loathe Barbie, but am extremely interested in broadening participation in CS. So, instead of a vague uncaring feeling, I instead feel very very strongly ambivalent, but cautiously happy.

Thanks for the link, Marissa!

Why version control?

You should have version control. For everything. Not because you’ll ever go back and look at old revisions, but so that you’ll know that you can go back and look at old revisions.

Armed with that knowledge, you can now freely throw away bits and pieces, secure in the knowledge that if you actually want them back, they are there in the revision control system. Interestingly, almost nobody actually uses this feature. Revision control systems are not there to save your old work. They are there to give you permission to throw that old work away. You can now throw it out with out fear. And you need never look at it again, because the knowledge that you CAN look at it almost always ensures that you never actually need to.

Error installing Postgresql on Ubuntu?

I just tried installing Postgresql on a pretty standard/untweaked Ubuntu installation, and I got:

Setting up postgresql-8.4 (8.4.1-1) ...
 * Starting PostgreSQL 8.4 database server
 * The PostgreSQL server failed to start. Please check the log output.
                                                                         [fail]
invoke-rc.d: initscript postgresql-8.4, action "start" failed.
dpkg: error processing postgresql-8.4 (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 postgresql-8.4
E: Sub-process /usr/bin/dpkg returned an error code (1)

Looking at the logfile this produced, I found the lines:
2010-01-11 15:16:25 EST LOG: could not translate host name "localhost", service "5432" to address: Name or service not known
2010-01-11 15:16:25 EST WARNING: could not create listen socket for "localhost"
2010-01-11 15:16:25 EST FATAL: could not create any TCP/IP sockets

So that sucked. Digging around, it turns out this is due to Ubuntu bug 516312, and that a quick solution is to edit the file /etc/postgresql/8.4/main/postgresql.conf and change the line that reads

# - Connection Settings -
# listen_addresses = 'localhost'          # what IP address(es) to listen on;
                                        # comma-separated list of addresses;

into

# - Connection Settings -
listen_addresses = '127.0.0.1'          # what IP address(es) to listen on;
                                        # comma-separated list of addresses;

For some reason, despite an /etc/hosts file which contains the line 127.0.0.1 localhost, postgres can’t properly resolve the name “localhost”. I could care about why and try and find the source of the bug, but it just doesn’t matter much to me right now. As long as you only care about running it locally you can change that one line in the postgresql configuration file and everything should work. Now you know, and hopefully this will save someone some time.

Algorithms questions are hard, but that’s inherent in the subject

Hello Hacker News and Reddit!  Holy crap there are a lot of you…

There are two approaches to teaching a class entitled “Algorithms”. One is to teach a survey course where you are told about a bunch of algorithms and how they work. Tests in such a class could be passed with brute force memorization and little understanding, and would definitely be easier than the tests in the second type of course, “How to Design Algorithms”, where the questions are asking students to design and analyze something new every time. My tests this term were definitely in the second category, which made them much harder than they would have been with the other approach. But I think the second category is what should be taught! Computers are for remembering things, people are for creatively discovering new things. In order to teach people to design algorithms, we do have to show them how some famous algorithms are designed, but it’s a question of course focus. Do we emphasize how the algorithms work, or the design principles that led to their creation? As my father often told me*:

The person who knows “how” will always have a job, but the person who knows “why” will be their boss.

I teach the “why” stuff because the “how” stuff is fundamentally boring. It’s well-solved, and, when confronted with that situation in the field, simply sit down with the algorithms book and translate the pseudocode into real code. Or, more likely, get the real code that someone else wrote and use that! But only people armed with answers to the “why” questions will be able to handle new problems in new scenarios.  This is why lots of CS job interviews ask tricky puzzle questions.  It’s kind of insulting, but it’s also trying to assess whether or not you can solve new problems in new scenarios.  They ask these questions, because being able to come up with new solutions to new problems is the key skill for a programmer who is being asked to do non-lame** work.

Computer Science (and Algorithms in particular) is the art of doing new computational things. Because if we did them before, then we should just use the old solution and move on. If a problem can be just be memorized and regurgitated, then solving it can be programmed (and not just the solution to the given problem, but also the meta algorithm of translating from problem to algorithmic solution). If finding the solution to a problem can be programmed, then training students to do so is largely a waste of their brain space, because the right solution would be “get a computer to solve it”. That’s why questions in a course about “How to design algorithms” will necessarily be tricky. Because we will need to almost always ask questions which need a (currently unprogrammable, but still real) algorithmic intuition that cannot be reduced to a series of steps.

We can’t ask: please write down an algorithm to sort a list of integers in O(n lg n) time using only a constant amount of extra space (hint: Heapsort), because solving those problems is just a cache lookup. Memorize the book and you’ve got your answer!  It makes no sense to require computer science students to pretend that they have to remember everything when the whole point of much of computer science is to use these powerful machines to aid us in thought and memory. I suppose an alternative is allowing students to use the Internet when answering my test questions — that’s a more “real world” environment — but in that case I would be unable to tell the difference between a problem solver and a good searcher of previous solutions. And I want to gauge how people perform when they encounter new unsolved problems, not how well they google. I could ask problems that I can guarantee nobody has ever seen or solved before, but that still doesn’t solve the problem of students quickly getting an answer via nerdsniping a message board. Also, making students do new research to pass an in-class test is just dumb. Worst of all, coming up with algorithms questions hasn’t yet been automated, and I can’t see any reasonable algorithm for doing so. Therefore, for the time being, both the design and answer of questions on algorithms tests will necessarily require an algorithmic intuition as well as just knowledge of the subject. And that means that coming up with good questions is tricky, and that so is the process of solving them.

*  This is true, he actually told me this many times.

** I use non-lame/lame to denote the difference between work where a much dumber person with knowledge of the same facts could/couldn’t do the job.  I would hope that most people want a non-lame job, although in the current economy one can’t be too picky.  And there are always some people who are motivated purely by money and no other concern, although these people often drop out of the CS major pretty quickly.

Code fast

Students simply cannot code fast enough to express themselves. When confronted with a problem, particularly a small algorithms problem, my first instinct is to write a little snippet of code for every solution I dream up. I can do this, because I write code very quickly.

My students can’t. They simply do not have the practice at expressing themselves rapidly using these thinking machines which surround us, which means that any and all of their programs are arduous constructions which must be carefully planned in advance because the cost of doing it wrong is many hours wasted. This means that the subparts of homeworks where I ask them to write a program are arduous painful things, because they do not have the speed required to make the problems quick.

To make this make a little more sense, what I am claiming here is not just that if you code faster, you will get your solution written faster. I am claiming if you code faster, you will get your solution done MUCH faster. The speedup is greater than linear.

To a first approximation, I think this is because when you can write code rapidly and grow a big solution up from a small one, then you can develop your solutions using some single-person approximation of agile techniques. When you code slowly, the penalty for wrong thinking is extreme, which in turn means that you tend towards a single-person version of waterfall techniques, with lots of big design up front. This slows students down, because it is highly likely that their initial solution are wrong, and so the sooner they can implement those solutions and see the wrongness, the better off they will be.

Being able to code quickly allows you to code fearlessly, because wasted code is wasted minutes, rather than wasted hours. And wasting minutes is not just quantitatively different from wasting hours, but is qualitatively different. Wasting minutes is part of the standard accepted practice of problem solving, but wasting hours is a major error.

iPhone not the problem. AT&T is the problem. (probably)

A week or so ago there was an article in the New York times asserting that the reason the iPhone kinda’ sucks when used in large cities (New York and San Francisco especially) was that the iPhone’s antennae were poorly designed. This was kind of a publicity coup for AT&T who are generally viewed as having an underprovisioned network with too few towers in major population centers.

Well, preliminary results from David P Reed indicate that nope, it’s really AT&T’s problem. They are reporting initial results that AT&T also sucks on all the other 3-G interaction devices they used. This would imply that it’s not the iPhone, and is instead a situation more like the one luridly described by Fake Steve Jobs.

Now you know, and knowing is half the battle.

Causation is not (necessarily) Correlation

Correlation has a dictionary meaning (two things are related to one another) and a math meaning (having to do with a mathematical relationship between two variable being well-approximated by a line). It is important to note that two variables may be perfectly correlated in the dictionary sense, but be mathematically completely uncorrelated.

Consider y=x^2. There aren’t even any random variables in this one. The variable y is completely determined by x. And yet, if you calculate the correlation coefficient between x and y on the interval [-1,1], you will find that it is exactly zero. This is a wonderful example of the fact that not only is correlation not causation, causation may not imply correlation! This insight has been brought to you by my former officemate, a statistician who occasionally would let slip wonderful fun facts like this one.

* Of course, x^2 and y are perfectly correlated even though x and y are not, but we’d need to use a different instrument variable s=x^2 to find that correlation.

The 24 Puzzle

The 24 puzzle asks “Given the numbers A, B, C, and D, is there a way to combine the numbers using arithmetical operations +,-,*, and / so that the whole equation evaluates to 24?” I got curious about what inputs to the 24 puzzle had a solution and what ones did not. So I wrote a 24 puzzle generator and a 24 puzzle solver. I also wrote the following shell script (not a joke, I actually wrote this command line):

python3 genall.py | while read; do (echo $REPLY; (echo 4; read; echo $REPLY; read; echo $REPLY; read; echo $REPLY; read; echo $REPLY) | python3 24.py) | xargs echo ; done | tee rawoutput.txt | python3 filer.py

After all that, I had web pages for every solution to the 24 problem. So I have linked all of these web pages below on the off chance that anyone else thinks this is interesting. The raw output is also available if you want to grep it and explore for patterns.

As a brief show of what can be done, here are the hardest small problem instances according a very simple heuristic:

$ grep '.*/.*/.*' rawoutput.txt | grep -v '[0-9][0-9]' | awk '{ print $1 " " $2 " " $3 " " $4 }'
1 3 4 6
1 4 5 6
1 6 6 8
3 3 8 8

Some of the solutions to those four problems seemed so strange that I had to verify them by hand. Another fun fact is that 5,845 out of 14,950 problems have no solution, so if you pick 4 numbers between 1 and 23 (inclusive) at random, you have a greater than 1 in 3 chance of having an unsolvable problem!

Enjoy! Please let me know if you find this useful or interesting, and especially if you do anything fun with it :)

See all the output at http://home.manhattan.edu/~peter.boothe/24solutions

We already have one palmtop per child

A majority of teens have a cell phone today, and with many schools unable to afford a computer for every student, teachers are starting to see them as a helpful learning device instead.

The AP Newswire

I keep throwing this out in idle conversation after I heard Bruce Sterling say something close to it. We already have one palmtop per child. Cell phones are ubiquitous and are, increasingly, powerful enough to get a lot done. Instead of convincing people that they need a new pedagogical device, how can we enable instruction on their current wireless devices?

We need PhoneSugar.