which contains more information, a 32 GB flash drive or a mustard seed?

Meaningless question.

Mustard seed is read-only, a flash drive is rewritable.

Assuming that a mustard seed (initially) contains a single celled embryo, as well as food stores and a seed coat…

The information content of the mustard seed is contained in it’s DNA. Mustard is a Brassica, and that does not make things easy - unlike humans, plants in the brassica family are polyploid - they don’t just have pairs of chromosomes, like we do, they have multiple pairs of chromosomes, effectively of different species. However, these multiple genomes are much smaller that the human genome, for example.

The genome for B.rapa (oil-seed rape) has been completely sequenced, and comes in at 529Mb (I assume b=base pair in this context), which is ~125Mbytes (2 bits per base pair, 8 bits per byte) to compare with 32Gb of Flash.

The memory stick holds more information. Plus it is rewritable. However, the mustard seed (in the right conditions) is self replicating, but not accurately so. You can’t do that with a Flash drive.


There’s other information in the seed, in addition to the DNA. Off the top of my head, there are epigenetic marks that include DNA methylation (a type of chemical modification), which theoretically adds another bit per base pair. And there are a few dozen possible modifications on the histone proteins that package DNA, which might total another (WAG) 40 bits per 200 base pairs of DNA. But, a lot of those modifications are mutually exclusive, and aren’t interpreted by the cell on such a fine scale. Rather than reading "position x is methylated, position x + 1 is not, position x+2 … ", it’s more that the average methylation level in the vicinity of a certain gene will influence the likelihood that the gene is transcribed.

So, again to throw out a WAG, I’d guess that these epigenetic marks might total 10% of the information content of the genome – so add another 10 MB or so.

Then, if you wanted to go really wild and crazy, there’s a lot of other biochemical information about the “state” of the seed. At the most abstract levels, you might want to consider levels of the hormones that control germination (very important for a seed!) And at the most needlessly detailed level, there’s information in the amounts and position of each molecule in the cell.

Information or just data? What is loaded on the flash drive? 32 gigs of ‘All work and no play makes Jack a dull boy’ is going to be different than 32 gigs of anime porn.

Exactly, and there’s going to be a lot of ‘junk’ in the mustard genome, I expect.

Then again, we could also say the same for the flash drive. I think it’s best to focus just on the easily-recoverable information content, the genetic sequence for the seed and the data-write capacity of the drive.

I am more and more of the opinion that ‘junk’ DNA has more of an impact on expression than we fully understand. It may not be coding proteins, but it seems that at least some of it may do something.

However, due to the polyploid nature of Brassica DNA, there is a lot of replicated information - multiple gene expressions etc on chromosomes. But that repeated information has an impact on the plant development.


Shit, if *nobody *was interested in the hard-to-recover information, I’d have to do something else with my life.

Regarding the whole “junk” idea - and whoever first coined that term should be shot - I think it’s accurate to say that we’re learning more and more that DNA operates via more methods than just sequence. Three-dimensional orientation, epigenetic markers, placement of the DNA within the nucleus, chromatin binding proteins, etc, etc, all play a role in determining how the genome works. DNA is the software, but there’s a whole crapload of hardware that is involved as well, and it’s not just raw DNA sequence that determines how the hardware interacts with the software.

Okay, now I’m wondering - why a mustard seed, specifically? Why not an ameoba, or an acorn?

My sermon-sense is tingling.

There are more molecules, atoms, particles, etc. in the flash drive, so it contains more information. The flash drive can also make more use out of the information. But unlike a mustard seed, it can’t make new flash drives. Obviously the mustard seed can’t make more flash drives either, but it can make more mustard seeds. So adding some water, sunlight, and some trace elements will eventually get you more information than the flash drive. And enough mustard seeds can be sold to get the money to buy a flash drive.

OK, but I think it’s not at all unreasonable to expect that a non-zero portion of any given genome is doing no more than taking up space.

In other words, the question was poorly phrased? It should have been something like “Can a 32 bit flash drive contain more or less information than a mustard seed?” since how much data a flash drive contains is kind of arbitrary.

I would cautiously agree. There are certainly big chunks of many genomes that can be deleted without any noticeable harm to the organism*. However, “taking up space” can still accommodate useful functions. “Space” could be a landmark. It could help ensure that other, important chunks are separated enough. It could allow gene A to be in the nucleolus and gene B to be at the nuclear envelope simultaneously. My own research is looking at what some of this space-filling DNA is doing during meiosis, and it’s not trivial, I can tell you that much. But, yes, if you twist my arm, I will grudgingly admit that there is almost certainly some DNA that is utterly useless. I just won’t be pinned down on how big I consider “some” to be.

*In short-term, laboratory condition experiments. In the wild, under evolutionary pressure, that may not be true.

Agreed, although the information content in that case (as pertains to this topic) would just comprise size and position,

Meant to add:

Whilst I bow to your superior expertise and knowledge in this field, and I have no desire to try to pin you down on how big is the ‘some’, it seems to me that it cannot be a trifling amount, or else one (or maybe both) of the following would be true:

  1. The genome would be terribly delicate with respect to change through mutation, retroviral insertion, even recombination.
  2. Design arguments start to look a bit plausible - if something’s highly intricate, functional in multi-layered modes AND entirely efficient/optimised, it’s maybe not irreducibly complex, but not necessarily what we would expect the product of evolution to be (at least, if we understand it at all).

Claude Debussy once said - “Music is the space between the notes” - is life the junk between the genes?

But I guess I do agree - some of that junk DNA is probably just junk and provides very little (or no) additional information.

Is this an experiment for Craig Venter and his synthetic bacterial genome?

They did encode their names in the junk sections - I guess they could just edit that material out, or set it all to one base pair (changing the 3D nature of the DNA). Or maybe they have already done some of these experiments.

And because the OP twigged my sermon-sense as well

[SERMON]If you drop a 32Gb Flash Drive in the garden, you have lost 32Gb of data. If you drop a mustard seed in the garden, you will have mustard growing and nesting birds for the rest of your life.[/SERMON]


Has anyone worked out what the heck kind of species is being described there, because mustard (and those of its allies with which I am familiar):
[li]Ain’t a tree[/li][li]Has not-particularly-small seeds (2mm or more in diameter - bigger than a lot of other common seeds - bigger than many seeds that do grow into big things - for example, willow, myrtle, fig)[/li][/ul]