Can "Information" be destroyed?

I’m not sure if this is what you’re getting at, but there is interpretational-dependence in this sense:

Suppose f and g are, say, programming languages. We might describe a string’s f-information as the length of the shortest program in f which outputs that string. Similarly, we can define a string’s g-information.

Strings S and T might be such that S’s f-information is far lower than T’s f-information, but S’s g-information is far higher than T’s g-information. In that sense, this measure of information is highly language-dependent.

[It is often pointed out that, so long as both f and g can describe interpreters for each other, that there is some upper bound on how much f-information and g-information can differ by (specifically, bounded by the lengths of the f-description of a g-interpreter and vice versa), but this isn’t really of any use in establishing, for any particular finite string, a language-independent description of how much information it contains (there will be some contrived language in which that string has a tiny description and other contrived languages in which it does not). The information content of a string, in the sense of information theory, is always relative to some choice of description-language or probability distribution or such things, rather than intrinsic to the string and invariant across all such measures… What it isn’t dependent on, though, is the “meaning” of the string, just the statistics of the string-generating processes one is considering it as coming from]

This doesn’t jibe at all with my understanding of Kolmogorov complexity. What gives?

This where entropy comes in to it again. The branchng of the wavefunction is a result of the entropy of the system increasing. I.e. the branching is time asymmetric as it is a direct result of the time asymmetry of the system due to low entropy initial conditions.

Btw I know this has been re-hashed several times, there are issues with viewing wave function collapse as a result of decoherence. Actual wavefunction collapse as originaly described is a different kind of time evolution and it is one that makes it impossible (as opposed to being so difficult as to be effectively impossible) to reconstruct earlier states of the system.

Yes and no. My view on it is this: to compress somethign you need a set of rules to do it with, the rules themselves can be regarded as information. Okay there are going to be compression schemes that can drastically compress some apparently random pieces of data, but the informational size of the rules will be so big as to make them inffective. The smaller sets of rules rely on simple correlations in the data.

Thanks, I haven’t read the paper,but I will do when I feel up for some severe mental taxation!

From reading the pargraph’s in Bekenstein’s article that relate to Bousso’s entropy bound I’m guessing the bound comes (in this instance) essentially from the ‘hairy’ material within the Schwarchild’s radius interaction with it’s surroundings pre-gravitational collapse(??).

Photons don’t just pop out of nowhere, for the most part, they need specific sets of conditions to ‘come about’ and with our perfect knowledge of the systm as a whole we should be able to trace them back to these conditions. Also here we’reonly considering determinstic time evoltuion in QM we either reject or ignore quantum randomness.

I don’t know; what is your understanding of Kolmogorov complexity and where does it clash with what I noted? I suspect what gives is the difference between the specific Kolmogorov complexity of particular finite strings (which is highly language-dependent) and the asymptotic Kolmogorov complexity of the prefixes of infinite strings (which is language-independent to at least the extent that one considers a function’s asymptotic behavior invariant under addition of an O(1) term, and only considers languages which can all interpret each other (e.g., Turing-complete computably interpretable languages)).

(There’s also the silly sense in which you can pretend Kolmogorov complexity is completely language-independent by saying “I only care about this one fixed language. Kolmogorov-complexity IS Java 5.0-complexity!”. But that’s like saying the star-rating of a movie doesn’t depend on the reviewer, so long as “star-rating” is defined as the number of stars given by Roger Ebert…)

Yes, agreed, it has been pushed from one place to another. But having said that, pretend there is a non-trivial compression scheme such that the sequence of a’s represents more decompressed info than the sequence of random characters.

Where are we deciding who or what the consumer is of some sequence of strings?
How can we just absolutely say that a sequence of the same character has less information than a sequence of different characters - it seems like it is completely relative.

That is what I am getting at. Yet, it has been said that a string of 0’s has less information than a string of mixed 0’s and 1’s - that seems to imply some absolute measure.

A) What you will find is that most of the programming languages people care about will be able to describe a string like “00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000” with a shorter program than a string like “00011100011000011100000111011100101100100100001101101011000100101111101111101011110000001011”. Of course, there are other contrived programming languages that just happen to have a built-in keyword for “00011100011000011100000111011100101100100100001101101011000100101111101111101011110000001011”, but these tend not to be languages anyone actually cares about…

B) Regardless of what programming language you work in, the average length of the minimal programs specifying each string of N bits with, say, between 40 and 60% zeros vs. ones will be much higher, for sufficiently large N, than the average length of the minimal programs specifying each string of N bits with, say, either all zeros or all ones. That’s because there’s a lot more ways to have a string with between 40 and 60% zeros vs. ones, for example, than there are ways to have a string that’s all one or the other; while you could hope to make some of those 40-60%er bitstrings just happen to have short programs in your language (with different ones having short programs in different languages), there’s too many of them for them all to have short programs in any one language.

C) None of this has anything to do with what the strings might be taken to “mean”, as such.

Well, that point of view is based on the concept that you shouldn’t intentionally try to confuse people. Saying “information cannot be destroyed” without telling people what information means is, IMO, intentionally trying to confuse them, in the same way telling someone that they are not doing work while they carry a load into a building is intentionally trying to confuse them. It’s not meaningful communication unless both people know the definition of the word. (And, at least with something like crudite, people know it’s a new word, and can ask about it.)

And, honestly, a lot of people then insist that the scientific definition is the correct definition, which is just as wrong as saying it is an incorrect definition.

indistinguisable seems to have this covered and clearly he’s got a better grasp of information theory than me, but to bring it back in context, performing a certain measurementr on a system will generally record it’s macrostate and some macrostates will correspond to a smaller range of microstates than other macrostates. Even if we are going to argue for a form relativism, there’s an in-built bioas: some measurements are just physically easier to make than others. And I’m sure we could reconnect this neatly by saying that the measuring apparatus themselves are part of a physical system and performing a measurement has consequences for the entropy of this system (infact this relates closely to what Half Man Half Witsaid earlier as the whole idea of decoherence as a cause for apparent wavefunction collapse is based on this).

And sometimes words have mutiple meanings in science and maths (e.g. the word ‘field’ has two completely unconnected meanings in maths). Good practice for anyone is to makes sure the meaning is clear from the context or if not explain the meaning, depending on who your target audience is.

Much obliged, thank you.

I have to go back and re-read a lot of posts here. I find it odd that information is considered a physical thing that exists and can be destroyed, rather than just a concept like numbers. It seems to me that it’s like asking the question can the number 7 be destroyed.

This is a little bit off the mark, because it suggests the phenomenon I am referring to here is purely an issue of number of strings; the reason the average complexity of a string with between 40 and 60% zeros vs. ones must grow linearly with N is indeed purely a matter of numbers of strings, but the reason the average complexity of the two strings of all-zeros or all-ones remains low (growing logarithmically with N if one demands the strings be output in a self-delimiting manner, and even constantly bounded if one simply asks for a potentially infinite stream of output whose first N bits are of the desired form) isn’t just because there’s only two of them for each N (after all, at each N, one could pick some two strings of very high average complexity), but furthermore, there is a computable function sending N to the unique appropriate length-N instance for each of those two patterns.

Eh, maybe it wasn’t a great phenomenon to mention, more confusing than clarifying, but I thought it might come up anyway, so I discussed it.

Well, the OP did ask about information “as scientists describe it”.

Whether scientists ought to call this quantity “information” is another question. Black holes aren’t exactly black, novas aren’t new, atoms aren’t atomic (indivisible), … what can you do?

Thanks, now my brain hurts.

After the first page, I just sort of skimmed most of the posts, they are too dense for this hours in the morning, but here are my two pesos:

As I understood, the law of conservation of energy is more correctly worded as the conservation of momentum, because at some point the definition of energy particles and matter particles gets fuzzy, with matter transforming to energy in matter-antimatter annihilation and all that.

Does this has to do with anything here?

I think the fact that in almost all branches, entropy increases, breaks the symmetry here, as there is a far smaller set of possible past, i.e. lower entropy, states than there are possible future states. But also, there are quite generically different possible pasts – think of things like last Thursdayism or Boltzmann brains: it could always be the case that everything, including our memories and other records of the past, just fluctuated into existence a moment ago.

Yes, you’re right, I should have addressed this. Basically, you can draw up a code that takes anything to anything else, so you can, for instance, encode all of Shakespeare’s works with just one ‘a’, if that’s how your code is defined. But such compression power comes at a prize; there’s no such thing as a free lunch, every ‘illegal’ gain in code efficiency has to be paid back eventually. So while this code may be extremely efficient in encoding Shakespeare, it might do very poorly for Dickens – i.e. encode it in a string far longer than the entire plain text of the work is.

This is essentially what Indistinguishable said, and I think it’s also been implicit in Francis Vaughan’s post. Consider his coding scheme, where each of the CDs in his collection has a very short encoding. Now, think about encoding another piece of music, using that same code. Provided his music collection covers a sufficient selection, it can be done – you can refer back to a given spot within every piece from that collection, essentially encoding directions like ‘play the nth note of piece k, shifted by pitch y, then the mth note of piece j’, etc. – and your previously highly efficient code becomes highly cumbersome.

Or consider the Library of Babel catalogue: let’s say, for definiteness, that each work contained within it is 100,000 letters long. You index lexicographically: the work containing just 100,000 'a’s is indexed as 1, the one containing 99,999 'a’s followed by one ‘b’ gets index number 2, and so on. This is obviously highly efficient for the first couple of books, but as you go on, it gets bad quickly: the first 26 index numbers are taken up by books varying only in the last letter, 26*26 numbers by books varying in the last two letters, 26[sup]n[/sup] numbers for books varying in n letters – so to index all books, you’d need numbers up to 26[sup]100,000[/sup]. In particular, the book consisting of all 'z’s is coded as a number with roughly 141,500 digits – longer, thus, than the book itself is. So what you gain in efficiency in the beginning, you end up having to pay back.

In fact, on average, this particular coding will end up somewhat less efficient than the original, plain text representation, because there are less symbols to work with – if books were only one letter in length, there would be 26 different ones, 9 of which would be coded with a one-digit number, 17 of which would receive a two-digit number, so that the average coded length would be roughly 1.7 digits. For a code with an equal number of symbols, the average coded length would be equal to the length of a book.

So when I talk about the information content of strings, I really talk about the efficiency with which they can be encoded on average. A random string, on average, is incompressible, while a highly redundant string, again on average, can be highly compressed. This is not in conflict with the fact that there exist certain codes within which a certain random string has a very short representation, while a certain redundant one gets mapped to something very complex. The reason for that is essentially that you want to be able to talk about information content independently of the coding scheme you use.

Hmm, I’m not sure I understand what you mean by that. Bousso’s bound essentially gives you a method to construct a holographic surface in arbitrary spacetime – basically, you start with a two-dimensional surface, then trace out surface-orthogonal, nondivergent geodesics, which consequently meet at some point (possibly infinity) (I think that’s about right). The area law for entropy holds for everything bounded by this construction, and the Bekenstein bound can be shown to be a special case of it. So in principle, you can appeal to a holographic description of the whole collapse, before, during, and after an event horizon forms, so that it’s clear that no hair gets shaved off.

I think in general Rindler horizons work just as well, so that you can describe the whole process from the point of view of an accelerated observer such that the collapse is hidden behind his horizon.

Another way to see this, I think, is to appeal to the generalised second law – which says essentially that the sum of horizon and ordinary entropy must never decrease. So that before a black hole starts to form, we have a certain amount of ordinary entropy, which gets ‘converted’ into horizon entropy such that the sum only increases (typically probably by a rather great amount).