Friday, February 18, 2005

The average abstract

Now that I have fineshed downloading my share of 2004's hep-th papers as provided by Joanna Karczmarek we can start to have some fun. First, we untar the abstract file and then combine them into a single file by

find .|grep txt|xargs -n1 perl -e '$/="\\\\";<>;<>;print <>;'>all_abs

This gives a nice collection of 38766 lines of more or less random high energy speak.

With the help of a simple Markov model (that just uses the probability that the next word is B if the previous was A), we can easily produce a lot more of this. For example I got

goldbergerwise mechanism be taken to the presence of topics unification is unlikely that of a quotient the results in order d field strengths the framework which contrary to dimensional minkowski superspace formalism as well below which should be fixed where the gauge hamiltonian for the kinetic term entering this paper we study of continuity restrictions in powers of the computation of both a subset of v the role we develop the bps breaking potential and preserves n extra dimensions using the scalar quantum string theory motivated in relation between free energy density is decaying mode sector of local geometries the

or even

sigmamodel has no chiral ring write down to symmetric massless states with matter and consistency of the effects of mtheory in the help of motion of those of supersymmetric type ii orbifolds including the newly formed in kkltlike vacua we investigate the transition in the acceleration we consider the presence of a tensionful codimensionone brane is finite k coincident branes wrapping cycles on the usual finetuning between the black supertubes of the edges evolve through the presence of gravitation together with a mass with qdeformed harmonic oscillator with scalar field coupled to end of spinorbit interactions for communication here

Locally, it looks quite good, although the grammar lacks a bit. And all this with a single page of perl babbel.pl:

#!/usr/bin/perl

while(<>){
@words = split /\s+/;
foreach $word(@words){
next if $word =~ /[{}\<\>\/\\]/; #not real word
$word = lc($word);
next if $word =~ /html/;
$word =~ s/[^a-z]//g;
next unless $word;
++($successor{$lastword}->{$word});
++$occurence{$word};
$lastword = $word;
}
}


@words = keys %occurence;

#foreach $word(sort @words){
# print "Learned $word... $occurence{$word} times\n";
# foreach $follow(keys %{$successor{$word}}){
# print "\t $follow\n";
# }
#}

$now = $words[scalar rand(@words)];

for $i(1..100){
print "$now ";
%follow = %{$successor{$now}};
$number = rand($occurence{$now});
foreach $next(keys %follow){
# print "$next $number";
if(($number -= $follow{$next})<=0){
$now = $next;
last;
}
}
}

Thursday, February 10, 2005

Things and Numbers

First of all, I would like to apologise for this entry that I am about to write because it will probably be at best a collection of vaguely related ideas and at worst incomprehensible. So let's jump into the stream of conciousness...

Yesterday, I sat in a seminar where a model of how we make decisions was presented. It was given by a psychologist and to make it short, I was not really convinced that this model even makes sense. In retrospect, I can trace back my uneasiness to the following: To come up with a quantitative model, the speaker was expressing all kinds of concepts in terms of numbers.

For example, given several options a,b,c..., the 'level of preference' P(a), P(b)... played a central role. I am completely fine with that: Level of preference is a map from the set of options to some level set L. Obviously, L is partially ordered ; I know that for some options I prefer them over others. For example I prefer getting a cup of coffe to getting a cup of coffee and having a hole drilled into my knee. However, I have a hard time to make up my mind whether I would like to have a cup of coffee or rather have my favourite soccer team win their next game. It's hard to compare them. But if pressed I would probably choose one over the other. So maybe, if pressed hard, L is even an ordered set.

Now the problems start, because everybody likes to be quantitative and there are obvious ordered set: Real (or rational or maybe integer) numbers. (In highschool we even had the set T of numbers that a given pocket calculator can represent). So we take L to be one of those number sets. So now the level of preference function assigns numbers to options.

This could still be fine, if we would only compare levels of different options. But with numbers you can do many other things: You can multiply them by other numbers ("My preference of option A is at least twice as my preference for option B"), you can add them ("My preference of A plus my preference of C is less than my preference for B", note that a priory this is different from "My preference of having A and C is less than having B"). That might still be fine if you have a linear model of preference (which the speaker at other places in the seminar didn't have as an answer for my question of what 'level of preference' really is, was that it is "like probability of choosing that option". Remember that probability is always between 0 and 1 and thus arbitrary multiples of probability are not probablilities).

It really gets crazy once you try to multiply two levels of preference, as can be done once you realize that both are numbers...

The upshot is, that I think typing (like in computer science) is important: You should always know what kinds of entities you are dealing with and what kinds of operations are allowed for those entities.

This remindes me of one of Feynman's anecdotes: He was on some board that approved physics and maths text books to be used in schools. In one of the books that tried to be creative he found a problem stating that there are stars of different colours and that these colours are related to the stars temperature. It gave the temperature for red, yellow, orange and blue stars. Then the student was asked to sum the temperatures of the above stars.

I am still trying to understand _why_ this problem is so stupid. Obviously, adding temperatures is not a good idea. But what exactly is the structure of/the allowed operations the set that temperature takes values in?

Let me mention some examples of such structures. Of course, there are groups, fields, vector spaces (where you can add and subtract and multiply by numbers, if you can multiply by something else it might be a ring or a module), there are affine spaces (where you can only take differences and those are valued in a vector space, connections come from such affine spaces). Furthermore, I have already mentioned several degrees of order.

So again, what kind of entitly (type) is temperature? First of all, it is ordered and there is a minimal point. Then, using perfect heat engines operating forward and backward you can map differences (or pairs) of temperature to mechanical work which comes from a one dimensional vector space and back. So you can compare the difference between T1 and T2 to the difference between T3 and T4, you can (by operating two engines) compare twice the difference between T1 and T2 to the difference between T3 and T4. You can even compare the difference between T1 and T2 plus the difference between T5 and T6 to the difference between T3 and T4. And of course the difference between T1 and T1 is zero. This is all explained in great detail in the Feyman lectures chapter 44-4 and 44-5. Still, it makes no sense operationally to add the temperatures of two stars. Why?

Before I end this this entry, let me mention one of the reasons why I like Perl, the programming language, so much: There you are not forced to identify finite sets with subsets of the integers. For example you can directly loop over the elements of a list rather than the integers between 1 and the number of elements of that list. So you can say

for $date(@girls){
&take_out_to_dinner($date);
}


rather than the much less natural


for i=1 to length(@girls){
&take_out_to_dinner(girl[i]);
}


You can even have arrays (called hashes) that are indexed by arbitrary names rather than numbers. Often this is much more natural to say $phone_number{$girl} where $girl is the name of today's date than assiging arbitrary numerical labels to the girls.

You extract the list of all labels of a hash so you can loop over it but the order in that list might change. So, especially you it makes no sense to say $phone_number{10*$girl+5}.

Of course, sometimes (like in numerical programs) it is natural to index an array by numbers to make arithmetic on the labels, so there are ordinary arrays as well.

Oh, this brings me to another example of an affine space: The space of pointers in C: There are of course the integers which are a ring and there are pointers: Allowed operations are adding numbers to a pointer (giving a pointer) and subtract pointers giving an integer. Furthermore you can compare pointers by comparing their difference to 0 (in Z). But it makes no sense to add two pointers.

Oh, still I forgot one thing: Back to social scientists aiming to be quantitative by mapping everything to numbers: We all know, that for small variations, everything looks linear. Refering to this, it is common to define a utility function for the various options of a decision (thing of it as "value in US$ to have that"). And at first order this is linear: Two coins are twice as valuable as one coin. However, this is only an approximation: Having 101 apples is not as much better than having 100 apples than one vs none. So, if you are aware of this you say your utility function is convex or sublinear. But even that is not true. Having one shoe is not half as good as having a pair of shoes. Or having a can without a can opener is not really useful. But you get pretty far with your linear (or sublinear) model.

Still there is one area where it is nearly infinitely bad: That is when you try to evaluate anything that has to do with information: If you tell me one fact twice it should not be better than telling it me once. Evenmore: You still know it if you tell me. You can still tell it someone else and there is no point for me to tell you "back".

I think this is part of the reason why people brought up with the linear model of economics have such a hard time to understand the workings of copyright (the attempt to linearise the value of information) and copyleft and open source culture.

Update: In the January 2005 issue of physics today I found a picture that fits in here:

Black Holes as a realization of Ockham's razor

I would like to expand a little more on idea that I have talked about earlier over at the coffee table about my uneasiness with black hole entropy.

As you know, Ockham's razor refers to the idea that if you have two theories that make identical predictions and you have no way to tell the difference between the two, you should stick to the simpler of the two. For example if theory A is that angels do not exist and theory B is that on every empty chair sits an angel but you cannot see or smell or in any other way determine his presence you should prefer theory A. A similar example would be what Sean Carol would describe as Religion Light: There is a god, but she does not influence anything in the physical universe.

A slightly different perspective would be that at least operationally A and B are the same theory.

To formalise slightly: In theory B, a state would be the tensor product W x A where W describes the state of the physical world and A is the state of the angels. Let's assume for simplicity that A can take k different values (or is vector/density matrix in a k dimensional angel Hilbert space). The fact that the angels are not observable means that all observables are of the form O x 1.

However, the entropy is different in the two theorie: In B, it is always higher by log(k). For ordinary thermodynamics this does not matter as there one always measures only differences in entropy and thus log(k) cancels.

But let's now turn on semi-classical gravity. Then we go to our favourite black hole and make a precision measurement of the horizon area. Then we drop a chair into the black hole and then measure the area again. The difference divided by 4 tells us the entropy of what we have thrown in. If it is just the entropy of a chair, there are no angels, not even invisible ones. If we find the entropy is the entropy of a chair plus log(k) we know that there are invisible angels!

Thus black holes can work as Ockham's razor to get rid of invisible degrees of freedom!

Tuesday, February 01, 2005

Luckily...

...I am blogging from outside the US (although, right now, I am in New Europe visiting Universidad Autonoma Madird for a seminar on the LQG string, great city, nice group and very friendly people, THANKKS!), because otherwise, the US government could censor my words (hmm, blogsport.com probably have their servers in the US...). At least if you ask half of the students asked in poll on the first amendment. Similar numbers think that newspapers should not publish stories without the government's approval and every sixth student thinks that unpopular opinions should not be allowed to be expressed. See the BBC website for more details.