series of questions about identifying genes on chromosomes

I’m massively confused about what seem to be the basics of genetics. I’d understand a lot more if I understood these things correctly:

  1. In decoding the genome of an individual, a series of A’s, T’s, C’s, and G’s is noted. Does that list of nucleotides (?) represent just one side of the dna ladder? IOW, how do the letters of a particular genome relate to our image of a dna molecule that is made of pairs of nucleotides attached to the two legs of the double helix? How is the order determined or reported? What’s that relationship?

  2. Is the dna of an individual made of the accumulated dna of the individual chromosomes? IOW, if a genome is 50,000 characters long, would, say, 3,000 come from one chromosome, another 2,880 come from another chromosome, etc?

  3. If this is the case, in humans, which chromosome comes first? second? last? How do they know which is which? Is there an accepted order?

  4. Isn’t it true that a gene is a discrete stretch of these letters that are part of the genome? As I understand it, a huge amount of our dna is random, extra stuff. At least there’s some question about it’s function, since it’s not directly implicated in the production of proteins. So - how do scientists know where one gene ends and another begins? How do they know which of the A’s, T’s, C’s, and G’s represents the end of the gene, the beginning of a gene or just random A’s, T’s, C’s, and G’s ?

I realize there’s a chance that I’m so confused that a simple answer won’t help much. I also know there are references out there, but apparently none of them has answered my questions well enough at this point. I am afraid that there are some assumptions that writers make that ignorami such as myself do not hold, so confusion reigns. Any help, Dopers? xo, C.

Other Dopers will no doubt give much better answers than I can, but I do recall a few basic facts from my biochem courses which might help clear things up.

The four basic nucleotide bases are, as you noted, A, T, C and G (which stand for adenine, thymine, cytosine and guanine respectively). These bases always pair up in a particular manner, so that if you know one side of the ladder, you automatically know the other side. C pairs up with G and A pairs up with T. So, if your strand of DNA is AAGTCTA then you know that the other side of the helix has TTCAGAT.

I don’t know what you mean about “how is the order determined or reported”. Often (but not always) certain base sequences indicate the start or end of a gene, so finding a section of DNA between a known “start” and a known “stop” means that strand might code for a protein that has some use. Or it could be junk. DNA is read in a specific direction, based on the chemical shape of the molecule, but offhand I can’t remember what it is (5’ or 3’)… the proteins that “read” DNA in a cell can only go one way along the track, so that makes it easier for us to find genes, since we don’t have to consider forward and backwards from a possible start location.

I don’t know if this is helpful or not, but at a very general level, DNA is “read” by special proteins, which then make something very similar to DNA called mRNA, which is then read again by a factory-type protein, which matches specific amino acids (AAs) to the DNA pattern. So if the DNA had, for example, GTC, then the copy would be CAG and the factory would go get a glutamine amino acid and chain it up to the previous ones. The chain of AAs makes a protein which then does stuff, like read DNA, break down fats, build the cell, convert food to energy, etc etc etc.

I don’t know what you mean by this. The “genome” is the entire sequence of bases, on each and every chromosome in a cell. Typically all cells in an organism have the exact same DNA (mutations, cancers, chimeras and moasicism aside…!). The genes for “eye colour”, to use a simple example, are always on the same chromosomes and at the same spots in all humans, so a given chromosome is distinct from the others and convention has assigned them names (actually numbers). I think relative lengths came into the naming as well, but I don’t know. So when things like cell division happen, and your chromosomes pair up, chromosome 2 from your mom and chromosome 2 from your dad will pair up; they will never pair up with 3, for example (exceptions/errors in this tends to lead to organism disease or cell death).

As I said, the order is mostly by convention, but each chromosome is distinguished by the genes it has on it, as opposed to those it’s neighbour has. In the cell itself, there is no order most of the time, and when they must pair up, the only important thing is to pair up with the right partner (which is, IIRC, done by matching spots along the length of the chromosome, but this isn’t something I ever studied in depth).

It’s been a long time since I studied this, but as I said, there are basic “start” sequences and “stop” sequences built into the DNA. These are places where the protein that reads it can bind to and start working, or which force it to release, basically. There is a lot of junk DNA, and clearly some start and stop sequences would likely appear there, but I’m not sure which biochemical method is used to prevent transcribing junk in the cell. I’m sure someone else could come along and explain this eventually!

For scientists, I have the gut feeling that a lot of the initial research was done with proteins. There are a heck of a lot of them, and if you can isolate it, then you can study it. A protein has a specific sequence of amino acids, and that amino acid can only be added to the protein if the mRNA has one of a couple of sequences (for example, the mRNA of glutamine could read either CAA or CAG). Therefore we know what the DNA says, more or less. We can make custom bits of DNA with markers attached to them to try and find the matching sequence in the real DNA, and that helps us find the gene. This is a very, very complex and in-depth field, and I am not very familiar with it, so I don’t really want to write more and accidentally lead you wrong!

I hope that helped. Let me know if you want me to clarify anything. I might be able to go more in-depth if you need me to. I think resources like Wikipedia might be more confusing than helpful, but getting your hands on an intro Biochemistry book would definitely answer a lot of your questions. The two that I’ve used in school were Stryer and Lehninger (sp?), but there are dozens of them out there.

Yes, generally only one side of the ladder is reported, because A and T always bind together, and C and G always bind together, so if you have one side, you automatically know the other side. This is also how the body copies DNA. Order, as has been said, is basically arbitrary and depends on the context - what you’re looking at. Reporting is all just accepted convention.

Essentially, yes, but don’t think of it as one single big long word. Think of the genome as a library. Each chromosome is a book, filled with different and generally unrelated genes, or chapters.

Human chromosomes are numbered by size. Chromosome 1 is the biggest, then chromosome 2, etc. The cell doesn’t care which is which. It can open any book to any chapter at any time. Order is irrelevant, biologically.

That’s a non-trivial question. There are some patterns (stretches of letters) that are well-known that seem to mark beginnings and ends, but there are a zillion variations, and it’s not always obvious. There are workarounds, like looking for stretches that translate into long stretches of amino acids without stop signals, but that’s probably more detail than you want to know. The cell recognizes beginning and ends because there are dozens or more proteins that can bind specific stretches of DNA and set up docking sites for reading machinery.

Smeg, thanks very much. It’s much more clear now. I also appreciate the implication throughout your response that while I’m talking about the human understanding of the genome, the cell approaches it all quite differently - functionally. One other question - since the 23rd pair is different in males and females, are both X and Y chromosomes included in the genome?

Yes, absolutely. They both carry important genes.

Well, the X certainly does, but the Y is the smallest of chromosones, and appears to carry almost nothing aside from the genes for maleness. Really, it couldn’t carry anything too important, since females manage to get by without any Y chromosone at all.

Some more detail on the general topic: The four letters are organized into three-letter “words” called codons. There are 444 or 64 possible codons, each of which (with one exception) codes for one amino acid. Since there are only 20 amino acids, this means that some amino acids are coded for by more than one codon. The mapping from particular codons to particular amino acids is identical, or very nearly so, for all known living things, which is the strongest indicator that we have that all known living things have a common ancestor.

Two of the 64 codons are what’s called a “start codon”. These indicate the beginning of the code for a protein, and also start it off with its first amino acid. These same codons can also appear in the middle of a gene, in which case they don’t mean anything special, but just contribute another amino acid. The cell’s protein-making mechanism (a thing called a ribosome) then reads off successive three-letter codons, and attaches the appropriate amino acid to the growing protein, until it reaches the “stop codon”, a three-letter sequence which does not code for any amino acid, but instead functions like the period at the end of a sentence, and tell the protein-making mechanism that the protein is done and to release it.

Well, we could get into the phenotypic differences between individuals carrying XX, XY, XO, and XYY, but I figured that was beyond the scope of the OP. Suffice it to say the Y chromosome has stuff on it, and that stuff is unquestionably part of the genome.

Yes, I’ve gotten my questions answered, and then some. If you want to continue discussing details, I’ll watch with interest. xo, C.