Photosynthesis – storage of solar energy by plants

We have seen how the body takes in energy and how it uses it. But before we can consume food to obtain the energy stored in it, that energy must have been stored there. This is the result of photosynthesis, which leads us to consider the chloroplast.

Chloroplast structure

Chloroplasts occur inside the cells of plants, like the nucleus or the mitochondria. Like mitochondria, they contain their own simple form of DNA, because, like mitochondria, they originated as bacteria which moved into another cell, felt at home and stayed. Within a double cell membrane, they have a number of closed membranes called thylakoids arranged in stacks, each of which is called a granum. There is fluid inside all these spaces; that inside the membrane and in which the thylakoids are arranged is called the stroma.

Chloroplast structure, from Wikimedia Commons

Chloroplast structure, from Wikimedia Commons

Photosynthesis takes place in two steps: light reactions in the thylakoid membranes and the Calvin cycle in the stroma.

  1. The light reactions use energy from sunlight in two ways: to store energy as ATP; and to transfer electrons to NADPH. Both are passed to the Calvin cycle.
  2. The Calvin cycle uses the electrons and ATP plus CO2 from the air to make glucose.

The light reactions thus furnish the energy and the fuel used by the Calvin cycle.

The two steps of photosynthesis, from Openstax College

The two steps of photosynthesis, from Openstax College

Light reactions

The light-reaction phase of photosynthesis is also called the Z-scheme, but since the Z usually is shown lying on its side, it looks much more like an N-scheme. Light reactions take place in three steps.

Photophosphorylation (Z-scheme), by author, after Kratz

Photophosphorylation (Z-scheme), by author, after Kratz (2009)

In steps (1) and (3), called Photosystem II (PII) and Photosystem I (PI),1)For historical reasons, photosynthesis II comes before photosynthesis I. energy from light excites an electron in chlorophyll to a higher energy level. Since the most important form of chlorophyll, chlorophyll a, absorbs red and blue light but reflects green, leaves are generally green. Other pigments may absorb light of other frequencies and so give different colors. These other pigments (called the antenna complex) transfer any energy they absorb to the chlorophyll a in what is called the reaction center, which can thus collect energy from light of different wavelengths, extending the sensitivity range of the process. Only in the reaction center are excited electrons passed to the next phase.

Although Photosystem II and Photosystem I are similar in operation, they differ in a number of ways. For one thing, their reaction centers contain different pigments: P680 in PII and P700 in PI. (The P numbers refer to the wavelength in nano-meters of maximum light sensitivity of each pigment.)

In photosynthesis II, light energy serves two purposes.

  1. It forces a reaction-center electron to be released to the electron transport chain of the next step, an electron transport chain, like those in mitochondria.
  2. It also powers water photolysis, the separation of water molecules into O2, protons and electrons.

All this takes place inside the thylakoid membrane.

Each of the products of step 2 has its own destination. The oxygen is released into the atmosphere where, for instance, we breathe it. The protons serve in the next step. And the electrons replace the electrons lost by chlorophyll in step 1. This process is historically and evolutionarily quite old, having already taken place over 3 Gya in cyanobacteria, where the plentiful source of electrons was water.

Photoloysis, the breakup of water to yield electrons occurs as follows

2 H2O → 4 H+ + 4 e + O2

I.e, four electrons at a time. But P680+ can only receive one electron. A process called the oxygen-evolving process exists which allows this to take place, but unfortunately, it is well beyond the scope of this document. Also, alas, it is not completely understood. If it were, it might enable us to extract hydrogen from water in an energy-efficient way, which could put an end to our energy problems.2)It could also completely shake up the world economic and political situation, but that is way beyond the scope of this document.

The electron released by PII then goes through photophosphorylation, an electron transport chain similar to that in mitochondria, but now taking place in the thylakoid membrane of the chloroplast. At each step, some of the electron energy is used to pump protons across the thylakoid membrane. At the end of the chain, the electrochemical gradient of the protons across the membrane serves to turn ATP synthase which converts ADP into ATP by the process of chemiosmosis. So at the end of step 2, we have ATP and a free electron.

PI again uses solar energy to kick an electron up to higher energy where it is released. This time, it can be replaced by the electron leaving the ETC. The electron released by PI has enough energy to go through a process which stores its energy on the electron carrier NADPH, a close relative of our old pal NADH. The solar energy is now stored in the NADPH and the ATP from the ETC and both move to the next step, the Calvin cycle.

In the light reactions, electrons and energy have different fates. Electrons from water wind up in NADPH; solar energy is transferred to ATP. So the overall effect of light reactions is to store solar energy in ATP for use by the plant or in the Calvin cycle, and to energize NADPH for the Calvin cycle. The complete chemical formula for the light reactions is the following.

2 H2O + 2 NADP+ + 3 ADP + 3 Pi + light → 2 NADPH + 2 H+ + 3 ATP + O2

Calvin cycle

The second step of photosynthesis, the Calvin cycle, takes place in the stroma of the chloroplast. It takes in CO2 and uses the chemical energy produced by the light reactions to make sugar molecules, usually glucose.

The Calvin cycles takes place in three stages, which are indicated in the figure.

The Calvin cycle, from Openstax College

The Calvin cycle, from Openstax College

In stage 1, carbon fixation, the enzyme whose “much-needed nickname” is RuBisCO3)Kratz (2009), 197., catalyzes the reaction of CO2 and 5-carbon RuBP into a 6-carbon compound which immediately splits into two 3-carbon compounds called 3-PGA. Then, in the reduction step, ATP and NADPH from the light reaction photosystem I reduce 3-PGA to G3P. On each tour of the cycle, one G3P separates from the cycle and these molecules eventually (at the end of six tours of the cycle) form a carbohydrate molecule, usually glucose (C6H12O6). The other G3P molecule and ATP regenerate RuBP, so the cycle can begin again. So it takes six tours of the Calvin cycle to convert CO2 into glucose. The complete formula is therefore the following.

6 CO2 + 12 (NADPH + H+) → C6H12O6 + 12 NADP+ + 6 H2O

ignoring the energy from ATP going to ADP and Pi.

It is impossible to stress overly much the importance of these reactions. They are essential for life on Earth. Not only is our oxygen-rich atmosphere originally due to photosynthesis by cyanobacteria and stromatolites, the current maintenance of oxygen levels depends on it. And the very energy we run on, as we have seen in this chapter, comes from the glucose made in the Calvin cycle.

This is worth repeating.

  • The Calvin cycles takes in CO2 from the air and uses the energy-rich products of the light reactions to form glucose and prepare for the next tour of the cycle. This cycle depends on the enzyme RuBisCO, which therefore is essential to life on planet Earth.
  • We and other animals eat the plants – and other animals. After breakdown of food by digestion, the glucose originating in photosynthesis is used by cellular respiration to provide energy in the form of ATP which powers our muscles, our neurons and other metabolic functions. The waste from this conversion is CO2, which goes back into the Calvin cycle.
  • Light reactions use the energy from sunlight to take in water and break it down into O2, protons and electrons. The electrons are energized by light to go through chemiosmosis and form energy-rich products which are passed to the next step, the Calvin cycle.
  • And so it goes.

Now, on to more physiology subjects, this time about communication.

Notes   [ + ]

1. For historical reasons, photosynthesis II comes before photosynthesis I.
2. It could also completely shake up the world economic and political situation, but that is way beyond the scope of this document.
3. Kratz (2009), 197.

DNA expression and regulation– protein synthesis

DNA, RNA and ribosomes, in that order, are essential components in the synthesis of proteins. DNA contains the information necessary not only for reproduction, but also for daily cell growth and maintenance. Messenger RNA carries the information to the ribosomes, themselves containing RNA. With the help of yet another kind of RNA, the ribosomes assemble the proteins. All this depends on gene regulation.

The use of DNA to initiate protein synthesis is called DNA expression.

This sequence of events is summed up in the so-called central dogma of molecular biology, often paraphrased as “DNA makes RNA and RNA makes protein.” More precisely, DNA is transcribed inside the nucleus to make mRNA, which is expelled from the nucleus to the cytoplasm, where it is translated to protein by ribosomes.

DNA –> transcription (nucleus)→ mRNA→ translation (ribosome)→ protein.

The recipe is expressed in “bytes” of three nucleobases; one three-base byte is referred to as a code-word. When transcribed to its complementary form in mRNA, it is called a codon. Since each base can have one of four values (C, G, A or T, in DNA), the codon can take on 64 values.

RNA transcription

The enzyme which does the work of “reading” a gene on the DNA and building a corresponding gene of RNA is called RNA polymerase1)In fact, there are several forms of RNA polymerase, but that complexity is well beyond the scope of this document.. There are at least four types of RNA and transcription makes them all. For protein synthesis, the RNA constructed is called mRNA, or messenger RNA. The DNA recipe begins with a sequence called the promoter. RNA polymerase contains a complementary sequence which binds to the promoter and launches transcription. As will soon be seen, transcription is started only if it is allowed by gene regulation. RNA polymerase unwinds a part of the DNA chain and reads code-words, starting with the promoter. As it reads the DNA, it constructs a complementary chain, called pre-mRNA, from nucleotides. It is complementary in the sense that if the DNA contains a C (or A or G or T) then the pre-mRNA contains a G (or U or C or A – remembering that RNA replaces T by U).

The raw materials RNA polymerase uses to construct RNA are nucleoside triphosphates (NTPs). Two of its phosphate groups contain a significant amount of energy from ATP. This energy is used to bond the nucleotides together to form RNA.

The RNA polymerase moves along the DNA, unwinding sections as it goes, reads the code words and assembles the appropriate pre-mRNA codons from NTPs. The separated DNA strands recombine in its wake. Eventually, it reaches a transcription-terminator sequence in the DNA and ends transcription. It now has gone through three steps, known as initiation, elongation (of the produced pre-mRNA) and termination. The pre-mRNA then is released into the nucleoplasm.

Splicing mRNA, from Openstax College

Splicing mRNA, from Openstax College

Before leaving the nucleus, the pre-mRNA must be cleaned up. This is needed because DNA contains non-coding, or junk, sequences. The codons which should be kept are called exons (like “expressed”) and those which should be deleted are called introns (like “interrupted”).2)I would have preferred for exon to mean “exit”, to be got rid of, and intron to mean “in”, as kept in, but some contrary biologist decided otherwise. He could at least have taken a vote! Small particles called “snurps” (for snRNPs, or small nuclear ribonucleoproteins), made up of RNA and proteins, bind together to form spliceosomes, which remove introns and splice the exons back together again, resulting in a cleaned-up form of mRNA.3)Are you wondering how the snurps can recognize the introns an exons? So am I. All I can say is that it is quite complicated and has something to do with methylation of the DNA strands. It is currently not completely understood why there are introns at all, but there are indications that they may be of importance.

The mRNA is then moved out of the nucleus for the next step.

Protein synthesis – translation

After the mRNA leaves the nucleus, it is used to provide the input data for the synthesis of proteins. This takes place on ribosomes.

There are two sorts of ribosomes in eukaryotic cells, depending on their location.

  • Free ribosomes float in the cytoplasm and make proteins which will function there.
  • Membrane-bound ribosomes are attached to the rough endoplasmic, reticulum, which are what makes it look “rough”. Proteins produced there will either form parts of membranes or be released from the cell.

In most cells, most proteins are released into the cytoplasm.

Ribosomes are made of ribosomal RNA, or rRNA (one more kind of RNA), and proteins. They are constructed within the nucleolus as two subunits, which are released through the nuclear pores into the cytoplasm.

In addition to the mRNA and the ribosome subunits, a method Is needed for supplying the appropriate amino acids to be linked by peptide bonds to make up the protein or enzyme being constructed. Enter still one more kind of RNA, transfer RNA, or tRNA.

A molecule of tRNA is a molecule of RNA folded into a double strand with loops which give it a precise 3-dimensional shape. It has on one end a binding site (adenylic acid) for an amino acid and, on the other, an anticodon, a site to match the complement of the codon on mRNA. A tRNA molecule is “charged” with an amino acid molecule by one of 20 types of tRNA-activating enzyme, or aminoacyl-tRNA synthetase, each one specific to a particular amino acid. Each type of aminoacyl-tRNA synthetase has a specific shape, which means it can only join with and service the corresponding tRNA molecule. It uses energy from ATP to covalently bond the tRNA with the appropriate amino acid from molecules in the cytoplasm. Such tRNA molecules, carrying an amino acid, are called aminoacyl tRNA.

The ribosome itself contains three spaces. It reads in the mRNA strand, like computers of my youth read in paper tape, so that it crosses the spaces as follows:

  • the mRNA enters at the A-site;
  • the current peptide element Is added to the growing chain at the P-site and the peptide chain exits from here;
  • the mRNA exits from the E-site.

Initially, the ribosome subunits are floating independently in the cytoplasm or attached to the RER.

The initiation of translation begins when the small ribosome subunit binds to the first ribosomal binding site on an mRNA strand. Initially, the tRNA carrying methionine, the amino acid indicated by the mRNA START codon, binds its anticodon to the mRNA binding site. Then the large ribosome subunit binds to the small, so the ribosome is now complete with the first tRNA in the P-site and the second codon of the mRNA in the A-site.

Gene translation in the ribosome, from Openstax College

Gene translation in the ribosome, from Openstax College

The process then pursues the elongation stage of translation. The aminoacyl tRNA for the codon in the A-site is carried in, so the first two amino acids are now in the P and A sites. The ribosome then catalyzes the formation of a peptide bond between these two amino acids. The ribosome then moves the mRNA so the first amino acid enters the E-site, the second enters the P-site, and a new one, the third, enters the A-site. It continues like that until a STOP codon enters the A-site and brings about termination of translation and release of the completed peptide chain.

All these steps of transcription and translation require energy, so protein synthesis is one of the most energetically costly of cell processes. Much of this energy is used to make enzymes essential to the functioning of the cell.

Once part of a strand of mRNA has left one ribosome, it can enter another. One strand may be in 3 to 10 ribosomes at once, in a different step of translation in each one. Such clusters of ribosomes translating the same mRNA strand are called polyribosomes.

Regulation of gene expression

Every cell in an organism has the same complete genome in its nucleus and so has access to all the same protein “recipes”. But heart cells should not produce proteins used only by the liver and no cell should produce proteins in quantities beyond what it can use. Controlling which proteins to express and when is called regulation. Note that this is one more instance of communication in the body, telling genetic machinery when and what to express.

Regulation of prokaryotic cells

Regulation in prokaryotic cells is relatively simple, as there is no nucleus. so transcription and translation take place almost in the same place and at the same time. Regulation in prokaryotic cells, though, almost always concerns transcription.

An example from a prokaryotic cell will show how this works – and introduce some new terminology.

The bacterium E. Coli normally uses glucose for energy. But if glucose is absent and lactose is present, it can use the latter sugar. The proteins necessary for the use of lactose are controlled by a sequence of genes called the lac operon. The operon includes not only the necessary genes, but, at the beginning, a promoter which indicates the beginning of the operon and is the site where RNA polymerase binds to begin the transcription. In between the promoter and the set of genes, of which there may be any number, is a sequence called the operator, which is where DNA-binding genes bind to regulate transcription.4)Look out for the terminology: Sean B. Carroll refers to the operator as a genetic switch, a term we will meet with in the disussion of regulation in eukaryotes.

When no lactose is present, a protein called the lac repressor is bound to the operator and the state of the lac operon is “off”. (Figure.) This is because the repressor blocks access to the rest of the operon.

The gene for the lac repressor is a constitutive gene: It is always expressed because it is the recipe for an essential protein. On the other hand, a regulated gene, is expressed selectively.

Regulation of the lac operon, from Openstax College

Regulation of the lac operon, from Openstax College

The lac repressor has a second, allosteric, binding site. When lactose is present, an isomer of lactose binds to the allosteric site of the repressor, which causes it to change its form and unbind from the operator. The lac operon is now in the “on” state. This form of regulation is called induction: Lactose is said to be the inducer of the lac operon and acts through the allosteric site of the lac repressor. Transcription now occurs, but slowly, because some glucose still may be present. But are the proteins mapped by the lac-digesting genes needed? In other words, is glucose very lacking?

The answer to that question and the second part of this process depends on the presence of glucose and is regulated by a second DNA-binding protein, CAP (catabolite activator protein). CAP is also an allosteric protein with one DNA-binding site and one allosteric site which binds to cyclic AMP (cAMP). CAP is only active when it is bound to cAMP. You guessed it, cAMP levels are high when glucose levels are low.5)If we go one step back, we see that glucose binds to an allosteric site on the enzyme adenylate cyclase, which makes cAMP from ATP, and disables it. So lack of glucose stimulates production of cAMP, which binds to CAP, which binds to the promoter to enhance synthesis. In that case, cAMP-CAP binds to the promoter and enhances transcription of the genes. So lactose can be considered the “on-off” switch for transcription of genes for lactose-digestion and cAMP-CAP, the “volume control”.

Regulation of eukaryotic cells

In eukaryotic cells, transcription occurs inside the nucleus and translation outside, so mRNA is shuttled across the nuclear membrane in between the two processes. Regulation in eukaryotic cells therefore may take place inside or outside the nucleus or at any step in the expression pathway, including control of access to the gene in the DNA, control of transcription, pre-mRNA processing, mRNA lifetime and translation, and modification of the final proteins. Even the activity levels of enzymes which facilitate expression can be controlled.

Pre-transcription regulation

Inside the nucleus, histones, around which chromatin is wound to make nucleosomes, can wind or unwind to change spacing of the nucleosomes and thereby allow or deny access to genes. This process is a form of epigenetic regulation.6)Look out, epigenetic regulation is used for somewhat different notions, too, and they are not all necessarily so. Since histones are positively charged and DNA, negatively, modifying the charge by adding chemical “tags” to either modifies the configuration of the DNA.

Transcription regulation

The most frequent regulation of expression in eukaryotes is during transcription. Control of transcription by the prokaryotic lac operon is a relatively simple process: In order to begin transcription of a gene, RNA polymerase must bind with the gene’s promoter but cannot do so if a lac repressor is bound to the operator region which follows the promoter on the gene.

Eukaryotic gene expression is regulated similarly, but with far more of everything. The single operator is replaced by a number of regulatory sequences, sometimes referred to as switches, which may be almost anywhere on the DNA strand, even far from the gene. The repressor is replaced by a slew of regulatory proteins called transcription factors. Gene expression is controlled when transcription factors bind to regulatory sequences. The existence of multiple switches for each gene allows the gene to be more than once and in different places; each place may activate a different switch.

There are two types of transcription factors.

  • general transcription factors affect any gene in all cells and are part of the transcription-factor complex;
  • regulatory transcription factors affect genes specific to the type of cell.

The two types of transcription factors work together with three types of regulatory sequences.

  • promoter proximal elements are, of course, near the promoter and turn transcription on;
  • enhancers are far from the regulated genes or in more than one place and also turn the transcription on;
  • silencers are also far away from the regulated genes but turn transcription off.

Activator transcription factors bind to enhancers to promote expression, repressor transcription factors bind to silencers to decrease expression.

The promoter in eukaryotic cells is more complex too, The basal promoter begins with the TATA box, recognized by its beginning which contains the seven-nucleotide sequence TATAAAA, followed by a set of transcription factors brought together by coactivator proteins and called the transcription factor complex. RNA polymerase only binds to the transcription-factor complex.

The whole set of transcription factors is summed combinatorially to determine whether or how much the gene will be expressed. Selective promotion or inhibition at combinations of these sites can therefore bring about tissue-specific gene expression. Each tissue type may have its own specific enhancer or silencer sequence for the same gene. For instance, the neuron-restrictive silencing element (NRSE) is a repressor which prevents genes from being expressed in any cells which are not neurons. In addition, environmental changes may bring about different gene expression according to current, perhaps temporary needs.

Transcription factors in eukaryotic cells

Transcription factors in eukaryotic cells7)Author’s own work

The above figure shows the case of an enhancer bound by activator transcription factors.8)The bobby-pin curl is an idealization; DNA shapes are far more complex than that. The enhancer, on the left, originally is quite far from the promoter, until DNA bending causes it to change its shape, allowing the enhancer to come in contact with the promoter and the rest of the transcription-initiation complex.

Since transcription factors are proteins, they also are coded by genes and these genes are regulated in turn by other transcription factors.9)Yes, i’s transcription factors all the way down, back to that original, unique zygote. But it depends on the environment too. Much research is being done on this subject. Transcription factors and signaling elements coded by some of these genes make up the genetic toolkit, as we will see in a moment.

Splicing regulation

In between transcription and translation, proteins may interfere with spliceosomes to modify splicing of pre-mRNA. Different intron selections can allow different mRNAs to be produced from the same pre-mRNA, a phenomenon known as alternative splicing.

Pre-translation and translation regulation

Yet another type of RNA, very short-stranded microRNA or miRNA, can bind with complementary mRNA before it is translated and promote repression of its translation. miRNA associates with RISC (RNA-induced silencing complex) to degrade mRNA.

Other proteins, RNA-binding proteins (RBPs), can bind with the 5′ cap or the 3′ tail of the mRNA and either increase or decrease its stability.

Phosphorylation or attachment of other chemicals to the mRNA protein initiator complex also inhibit translation.

Similar bindings may take place on the protein after translation and modify its stability, lifetime or function. RNA does not hang around forever, but eventually is degraded and is no longer functional. So controlling its lifetime is another way of regulating its activity.

The development genetic toolkit – what evo devo tells us

Development, especially embyronic development, is the means by which a genotype, a set of genes, becomes a phenotype, a particular living organism. Although mutation works on genes, natural selection works on phenotypes. So development and evolution work hand in gene, so to speak, and the branch of biology which studies them together is called “evo-devo“.

The major discovery of evo-devo is the collection of genes known as homeoboxes, the best known of which is made up of Hox genes. Many animals have a disposition of body parts along an axis, such as the antennae, wings and legs along the body axis of a fruit fly, or the existence or not of ribs along the vertebrae of a vertebrate animal. It turns out that the choice of body part at each segment along the axis is governed by a single gene. These “master” genes control the developmental differentiation of, for instance, a fruit fly’s serially homologous body parts10)The front legs of a cat and our arms are considered homologous body parts. Structures along a body axis, similar but different, are called serially homologous with respect to each other.; in simpler terms, its body pattern.

Of the approximately 1000 or so base pairs constituting each of these genes, a subset of 180 pairs, coding for 60 amino acids, is very similar to such a subset in other “master” genes; otherwise, the genes are different. The 180 base pairs are called the homeobox; the proteins they express, the homeodomain.

Homeobox genes code for transcription factors which control gene expression during development. Since the proteins discussed so far change the cells they modulate into something else, they are called homeotic11)Homeosis the transformation of one organ into  another. transcription factors and their genes are homeotic genes. The protein domain they express is therefore a homeotic domain. The genes of this particular family are called Hox genes. Other homeobox families also exist, as we will see a few in a moment. Hox genes are just one family of them.12)I can not find a precise statement of how many.

Homeotic genes occur in clusters, with the genes of a cluster in the same order as that of the body segments they control. Since the genes are ordered in the clusters, it is possible to change the gene at the antenna position on a fruit fly to a leg gene and a leg develops at this position on the fly.

Hox genes specify the development of human arms and legs as well as of fruit-fly legs and antennas. Hox genes in vertebrates are responsible for the identity of vertebrae, which ones make ribs or fuse to form the sacrum. Modifying them in mouse genes can cause ribs to form where they should not or not form where they should. Hox genes are sufficiently similar that introduction of mouse Hox genes into a fly can cause the growth of the indicated organ — in fly format. They also control the very different serial structure of snakes.

It is remarkable that quite similar homeodomains have been found in almost all animals. Such conservation of homeobox genes across species shows that embryonic development of most animals, fungi and plants is controlled by approximately the same genes. They must have been around since animals diverged from each other over 500 Mya. The original Hox gene was duplicated and then each copy took on slightly different functions. Subsequent duplications and modifications have led to the diversity of animals today. Comparison of the genes can contribute to building at least a partial tree of life.

The set of different families of homeobox genes conserved across species are considered to be members of what biologists call the genetic toolkit. The toolkit is common to almost all animals, with only little variation from one to another. It contains genes not only for transcription factors, but also for various molecules which are signaling elements. They play important roles in embryonic development, or embryogenesis.

In other animals also, the genes exist in clusters, with the gene order in the cluster corresponding to that of the organism’s parts. Different Hox genes, being similar but slightly different, bind to different regulatory sequences on DNA and therefore regulate different genes. One homeobox protein may regulate many genes and a number of homeobox proteins may work together to refine selection. Because of this possibility of multiple binding, a small change in activation of toolkit genes can bring about a large change in the phenotype. So the genetic toolkit may explain development more simply than if all genes had to be specific to each different part, location and development time of an organism.

Toolkit genes themselves have multiple switches. Switches are the means by which a relatively limited set of toolkit genes may be used differently in different regions, or even different animals, or at different times in embryonic development – which furnishes material for evolution.

A specific bodily environment (liver, heart, blood, …) contains some set of organic molecules specific to that environment. These molecules or a sub-set thereof will serve as transcription factors to activate a particular sub-set of the toolkit. In other words, the environment chooses which tools to use.13)I see this nicely in an analogy with a computing program. A program to construct an organism will contain a higher-level library of homeobox routines for making, say, eyes or legs. Another library will contain the specific routines for that organism. A mouse organism will pass control at a specific place to a Pax-6 gene which will pass control to the appropriate eye routine. The proteins expressed by toolkit genes will activate or suppress expression of body-part proteins at that place and time.

Environmental molecules ==> toolkit proteins ==> body parts

Each arrow indicates that the object to the left switches on expression of the object to the right.

Some terminology helps to understand the evo-devo literature.

  • Transcription factors are proteins and so are not on the DNA string, therefore not on the same molecule as the DNA which is regulated. They therefore are called trans-acting regulatory elements (TRE).14)In Latin, “cis” means “this side of” and “trans” means “the other side of”. Think of cis-Alpine (this side of the Alps) and trans-Alpine.
  • Switches are on the same string of DNA as the regulated gene and are called cis-acting regulatory elements (CRE).

So one can say that TREs bind to CREs to regulate gene expression.

Let’s resume:

  • A homeobox is a sequence of genes about 180 base-pairs long, corresponding to about 60 amino acids, which encode a protein domain which consists of transcription factors for genes. Homeoboxes are found, for instance, in bacteria, fruit flies, mice, frogs, cows and humans. There are different kinds, or families, of homeoboxes.
  • A homeodomain is a protein domain corresponding to a family of homeobox genes.15)A domain is a conserved part of a protein sequence which can exist independently of the rest of the protein chain.  Homeodomains can be seen as the building blocks of development and evolution.
  • The set of homeoboxes comprises the genetic toolkit.
  • Hox genes are a subgroup of homeobox genes. They occur in similar forms within genes for morphogenesis in animals, fungi and plants. The objects transcribed are regulated by a common homeobox, but correspond in their nature to the specific species.16)Some authors use the term homeobox only for Hox genes.

The following table lists just a few of the homeobox families and the organism components they regulate.

Protein name Penotype regulated
Hox body regions (e.g., head, thorax or abdomen)
Pax-6 eyes
Distal-less (Dll) limbs
Sonic organogenesis (tissue patterns)
Ulrabithorax (Ubx)  represses insect wing formation

In all animals, there exist similar gene sequences corresponding to protein domains which are transcription factors for that animal’s version of some phenotype. A Pax-6 gene from a mouse makes an eye form in a fruit fly – a fruit-fly eye, not a mouse eye.

The existence of different levels of transcription factors also explains how a small genetic change (in a transcription factor) can bring about a relatively important change in the phenotype of the organism.

Cell differentiation

Stem cells are those which may split and form any kind of cell.17)Usually to make an identical stem cell and another cell, which may or may not be a stem cell. But once a specific type of cell is made, it can only do certain things. This is because it no longer has access to the entire recipe book (genome), but only those recipes which it needs. The cell is then said to be differentiated and the process for making it is differential gene expression. Through gene regulation, they only have access to the genes they need to fulfill their particular function. Such regulation or differentiation depends on the cell’s environment.  We have seen an example where the presence of lactose induces the expression of the lac operon.

Gene regulation can fill a book. And cell differentiation can fill another.

Continue with cell division and the cell cycle.






Notes   [ + ]

1. In fact, there are several forms of RNA polymerase, but that complexity is well beyond the scope of this document.
2. I would have preferred for exon to mean “exit”, to be got rid of, and intron to mean “in”, as kept in, but some contrary biologist decided otherwise. He could at least have taken a vote!
3. Are you wondering how the snurps can recognize the introns an exons? So am I. All I can say is that it is quite complicated and has something to do with methylation of the DNA strands. It is currently not completely understood why there are introns at all, but there are indications that they may be of importance.
4. Look out for the terminology: Sean B. Carroll refers to the operator as a genetic switch, a term we will meet with in the disussion of regulation in eukaryotes.
5. If we go one step back, we see that glucose binds to an allosteric site on the enzyme adenylate cyclase, which makes cAMP from ATP, and disables it. So lack of glucose stimulates production of cAMP, which binds to CAP, which binds to the promoter to enhance synthesis.
6. Look out, epigenetic regulation is used for somewhat different notions, too, and they are not all necessarily so.
7. Author’s own work
8. The bobby-pin curl is an idealization; DNA shapes are far more complex than that.
9. Yes, i’s transcription factors all the way down, back to that original, unique zygote. But it depends on the environment too. Much research is being done on this subject.
10. The front legs of a cat and our arms are considered homologous body parts. Structures along a body axis, similar but different, are called serially homologous with respect to each other.
11. Homeosis the transformation of one organ into  another.
12. I can not find a precise statement of how many.
13. I see this nicely in an analogy with a computing program. A program to construct an organism will contain a higher-level library of homeobox routines for making, say, eyes or legs. Another library will contain the specific routines for that organism. A mouse organism will pass control at a specific place to a Pax-6 gene which will pass control to the appropriate eye routine.
14. In Latin, “cis” means “this side of” and “trans” means “the other side of”. Think of cis-Alpine (this side of the Alps) and trans-Alpine.
15. A domain is a conserved part of a protein sequence which can exist independently of the rest of the protein chain. 
16. Some authors use the term homeobox only for Hox genes.
17. Usually to make an identical stem cell and another cell, which may or may not be a stem cell.

Some basic biochemistry

Understanding physiology and neuroscience requires knowing a certain amount of biochemistry. Most of the building blocks of our bodies are macromolecules – proteins (long chains of amino acids), polysaccharides (carbohydrates), lipids (fats) and nucleic acids (which make up DNA and RNA).

Amino acids and proteins

Amino acids are the basic building blocks for proteins. In a way, they are quite simple, all being variations on the same basic formula.

Common formula for amino acids, by "GyassineMrabetTalk" via Wikimedia Commons

Common formula for amino acids, by “GyassineMrabetTalk” via Wikimedia Commons

Each amino acid consists of a central carbon atom, an amino group (NH3+), a carboxyl group (COO) and a variable group, designated by the letter “R”, for residue. In the figure, the third H in NH3 has been transferred to one O on the COO to make COOH and balance out the total charge.

The complete set of amino acids comprises only 21 acids and is shown in the following figure. It is often stated that there are only 20 amino acids, in which case selenocystein, which occurs rarely, has been omitted.


The amino acids, by Dan Cojocari via Wikimedia Commons


Amino acids are the basic building blocks of proteins. A protein is a polypeptide, that is a polymer (a chain of linked subunits) formed by condensation (ejection of water molecules) so as to link the amino acids by peptide bonds. Schematically, it looks like the example in the following figure, which shows the OH on the left combining with an H+ on the right to make a water molecule and leave the two amino acids connected by a peptide bond. Actually, the process is not that direct, but goes through several enzyme-assisted steps in order to achieve the peptide bond.

Peptide formation by condensation of two amino acids, by "GyassineMrabet" via Wikimedia Commons

Peptide formation by condensation of two amino acids, by “GyassineMrabet” via Wikimedia Commons

The bonding properties of proteins depend largely on their shape. The shape of the protein depends first on the sequence of amino acids, which constitutes the protein’s primary structure. The polypeptide chain may then coil up into a secondary structure called the alpha helix. The R groups may interact among themselves, bringing about a change in the 3-dimensional shape, or conformation, the tertiary structure. Different polypeptide chains then may bind to form a quaternary structure. In this way, proteins can take on very intricate shapes.

Four hierarchical structures of hemoglobin, by OpenStax College via Wikimedia Commons

Symmetry does not seem to be well respected in biology. A helical protein with a right-handed twist can not generally be substituted for one with a left-handed twist: It just will not work the same way.

3D structure of myoglobin protein. Alpha helices are turquoise. By AzaToth via Wikimedia Commons

Proteins may be enormously long polypeptide chains. Enzymes, neurons and receptors on cells are all composed of proteins.


Enzymes, which are usually proteins1)RNA can also function as an enzyme and it is not a protein., serve as organic catalysts, meaning that they help to bring about reactions that otherwise would not happen or would happen far too slowly. They only bring about reactions which are energetically possible but which nevertheless need a “push” to get started. Enzymes provide the push by lowering the activation energy of the reaction. Complete equations for different reactions would include the enzymes on both sides, but they are usually omitted. Every physiological process in the body depends on enzymes. Enzymes themselves only work under rather strict conditions of, for instance, temperature and acidity. If the pH or temperature is not just right, the enzymes will not work, the reactions will not take place and the organism will suffer. The names of enzymes generally end in -ase, for example, lactase.

An enzyme can do its work because of its shape. It folds itself so as to form a pocket called the active site. A molecule which fits into the active site is called a substrate. The enzyme can then usher the substrate through the reaction. This “lock and key” model of enzyme-subtrate interaction is refined further in the induced-fit model, wherein dynamic modifications in the enzyme’s structure enable It to exactly fit the substrate.

The body can regulate the rate of such reactions by regulating the efficiency of the enzymes which catalyze it. One way to do this is to have a molecule similar in shape to the substrate and use it to block the active site. Or a molecule can bind to what is called an allosteric site on the enzyme, meaning a site which is not the active site. Binding to such a site changes the shape of the molecule and thereby renders it ineffective for binding with its usual substrate. If the enzyme catalyzes a reaction too much, so that there is an excedent of end products, the end products themselves may attach to an allosteric site and block further reactions, resulting in a feedback mechanism which reduces the rate of the reaction.

Reactions catalyzed by enzymes generally take place in a number of small steps rather than all at once. This has a double advantage:

  • At each step, the enzyme can bring the reactants together, reducing the activation energy, the amount of energy needed for the reaction to begin.
  • The energy output from each small step will not be so much as to harm the cell.

The sum of all the small steps is referred to as a metabolic pathway.


Carbohydrates are molecules composed of carbon, hydrogen and oxygen, usually with the latter two elements in the same relative amounts as in water. So a generic “carb” could be represented by the formula


Carbohydrates are saccharides, or sugars, and referred to as monosaccharides or polysaccharides, depending on the length of the molecule.

The most important monosaccharides in the body are: two, ribose and deoxyribose, based on rings of five carbon atoms (pentoses) and three, glucose, fructose and galactose, based on rings of six carbon atoms (hexoses).2)In fact, glucose, galactose and fructose all have the same formula, C6H12O6, but differ in their conformations. Similarly, ribose and deoxyribose share the same formula, C5H10O5, but different conformations.

The five common monosaccharides, from Openstax College

The five common monosaccharides, from Openstax College

Saccharides formed from two monosaccharides are called disaccharides. Important ones for the human body are sucrose (table sugar), lactose (milk sugar) and maltose (malt sugar).

Polysaccharides may contain thousands of monosaccharides. Common ones are starches (polymers of glucose found in plant foods), glycogen (a polymer of glucose used for storage in the body) and cellulose (“fiber”, found in the cell walls of plants).

We will be considering the importance of carbohydrates in the body’s production of energy from food.


Lipids are mostly hydrocarbons with very little oxygen and so forming only non-polar C-C or C-H bonds, making them hydrophobic. They consist of triglycerides, phospholipids, cholesterol and small quantities of other substances. Lipids are necessary for the formation of cell membranes and for other functions within cells.


The commonest form of lipid (“fat”) in the body is triglyceride, consisting of a glycerol nucleus covalently bonded to the ends of three fatty-acid chains, long hydrocarbon chains terminated at one end by a carboxyl group (COO-) and at the other by a methyl group (CH3).

Triglyceride structure, with three fatty acids (orange background) attached to glycerol (pink), adapted from Openstax College

Triglyceride structure, with three fatty acids (orange background) attached to glycerol (pink), adapted from Openstax College

Fatty acids may be saturated or unsaturated, meaning saturated in bonds with hydrogen. A saturated fatty acid has only single bonds between carbon molecules, leaving two bonds free to connect with hydrogen. An unsaturated acid may have a double bond between carbons, meaning each one can only bond with one hydrogen. The double bonds between carbons may change the shape of the fatty acid.

Saturated and unsaturated fatty acids, from Openstax college

Saturated and unsaturated fatty acids, from Openstax college

Saturated fatty acids pack tighter and so exist generally as semi-solid substances called fats. Unsaturated fatty acids pack more loosely (because of the kinks) and are the constituents of more liquid oils

It is currently thought3)Or, at least, recently. It’s hard to keep up with what nutritionists tell us. that saturated fats lead to increased risk of heart disease, relative to unsaturated fats. The worst, though, is thought to be so-called trans fats. For some reason, food furnishers sometimes convert unsaturated fats into saturated ones by hydrogenation, the addition of hydrogen atoms. Trans fats are those which have only been partially hydrogenated. On the other hand, there is evidence that omega-3 unsaturated fats are effective in reducing the risk of heart disease and perhaps beneficial in other ways. They are called omega-3 because the word “omega” is used in biochemistry to refer to the methyl end of the fatty acid chain and the double carbon bond is the third from that end.


Phospholipids are similar to triglycerides, but the glycerol is attached to only two fatty acids, the third being replaced by a “head group” containing phosphate.

Phospholipid structure, from Openstax College, via Wikimedia Commons

Phospholipid structure, from Openstax College, via Wikimedia Commons

The phosphate “head” is negatively charged and therefore hydrophilic but the fatty acid tails are hydrophobic, so the molecule is ampiphatic (as discussed in the chemistry chapter) and forms micelles or membranes in an aqueous environment. Of major importance, phospholipids are the principal component of cell membranes.


Just as proteins are polymers formed from chains of amino acids, nucleic acids – DNA and RNA – are polymers made up of chains of linked nucleotides. A nucleotide is composed of a pentose (five-carbon) sugar molecule like deoxyribose (which gives the “D” in DNA) or ribose (in RNA), a nitrogenous base (or nucleobase) and one phosphate group.4)Common usage employs the term nucleotide for those with more than one phosphate group. Different nucleotides contain different bases.

Nucleotides, from Openstax College

Nucleotides, from Openstax College

There are five possible nucleobases in two groups:

  • pyrimidines – cytosine, thymine and uracil, with a single-ring structure; and
  • purines – adenine and guanine, with two rings and therefore two nitrogen atoms.

Another, very special nucleotide is adenosine monophosphate, or AMP. When a second phosphate group is added to AMP, it makes ADP (adenosine diphosphate); addition of a third phosphate group makes adenosine triphosphate, or ATP, the “energy currency” or energy carrier in cells of all living organisms. Like all nucleotides, AMP consists of a nitrogenous base attached to a pentose sugar attached to a phosphate group; in this case, the nitrogenous base is adenine and the pentose sugar is ribose. It takes energy to add a Pi (phosphate) to make ADP or a Pi to ADP to make ATP. This energy is stored in the ATP molecule as chemical potential energy and can be recovered later to do useful biological work, such as to flex muscles (including heart muscles), make blood flow, power peristaltic movement of the intestines or permit action potentials in neurons. We will talk much more of this in the next chapter.

A nucleotide without the phosphate group is called a nucleoside, so ATP may also be referred to as a nucleoside triphosphate. Nucleoside triphosphates are the raw materials for building RNA molecules.

ATP and ADP, from Openstax College

ATP and ADP, from Openstax College

Nucleic acids – DNA and RNA

The nucleic acids, DNA and RNA, are assembled from nucleotides. They differ in three ways:

  • DNA, deoxyribonucleic acid, contains deoxyribose as its sugar; RNA, ribonucleic acid, contains ribose.
  • The “allowed” nucleobases for DNA are adenine (referred to in this context as (A), guanine (G), cytosine (C) and thymine (T); in RNA, T is replaced by uracil (U).
  • DNA molecules form a double strand; RNA, a single one.

The IUPAC (International Union of Pure and Applied Chemistry) has a rather hairy set of rules for numbering carbon atoms in organic compounds. In the case of the sugar in a nucleotide, the 1′ carbon (one-prime, prime to denote sugar) is the one attached to the nitrogenous base. The count moves around the ring away from the oxygen apex.

Nucleic acids are formed by dehydration (or condensation, removal of a water molecule) between a pentose sugar of one molecule (the 3′ carbon) with the phosphate (on the 5′ carbon of the pentose) of another. The result is called a phosphodiester bond. The chain is thus held together by a sugar-phosphate backbone, independently of attached nucleobases, which protrude out from the chain.


DNA chains form double strands due to hydrogen bonds between nucleobases on each chain, with A bonding only to T and C only to G. The result forms a double helix. A purine (A or G) is always bonded to a pyrimidine (C, T or U). Note from the preceding figure that there are three hydrogen bonds between guanine and cytosine, but only two between adenine and thymine.

The combination of two DNA strands into a double helix offers the advantage that the nucleobases are not sticking out into the cytoplasm where they may be more easily mutated. Rather, the bases of the two strands are “holding hands” (through hydrogen bonds) to protect each other from mutation. This increased security may explain why DNA, which stores genetic information, forms a double helix, but RNA does not.

Some detail: The nucleic acid strand is polar, i.e., the ends are not the same. One end has a phosphate group attached to the 5′ carbon of the sugar; this is called the 5′ end. The other end has a hydroxyl group (OH) attached to the 3′ carbon of the sugar, so this is called the 3′ end. When combining into a double helix, the ends are reversed, i.e., the 3′ end of one is opposite the 5′ end of the other.

Since the total length of all the DNA strands in a human nucleus would equal 2-3 m, it must be compacted in order to fit into the nucleus. The helical strand is wrapped around histone proteins to form nucleosomes. The string of nucleosomes is then twisted and re-twisted, like a piece of cord, until it forms a compact string called chromatin. The chromatin will be used to form chromosomes only when needed for reproduction.

DNA compaction, from Openstax College

DNA compaction, from Openstax College

Oxidation-reduction and electron carriers

Oxidation and reduction occur in all domains of chemistry, not just in biochemistry. They occur together in oxidation-reduction, or redox, reactions. An entity which loses electrons is said to be oxidized; if it gains electrons, it is reduced. Think of its charge, which becomes more negative as it gains an electron. Oxygen likes to gain electrons, so when it pinches one from another substance, that substance is oxidized.  A simple example is Na and Cl going together:

Na + Cl → Na+ + Cl

The Na loses an electron and becomes positive; it is an electron donor and is oxidized. The Cl gains an electron, becoming negative, and is reduced.


H2 + F2 → 2 HF

which is perhaps not easy to recognize as an oxidation of hydrogen. But consider the two half-reactions, the obvious oxidation part

H2 → 2 H+ + 2 e

and the reduction part

F2 + 2 e → 2 F

Put them together to get

H2 + F2 → 2 H+ + 2 F → 2 HF

In general, a substance which is oxidized, i.e., gives up electrons, is an electron donor or reducing agent or reductant. One which is reduced, i.e., gains electrons, is an electron receptor or oxidizing agent or oxidant. Schematically, we can write

donor (reductant) ←–> e- + electron receptor (oxidant)

where the reductant and oxidant together are said to constitute a conjugate redox pair.

More interesting examples, which are important in cellular respiration, are those of the coenzymes5)A coenzyme is a non-protein compound that is necessary for the functioning of an enzyme. Enzymes are macromolecular catalysts, most of which are proteins. nicotinamide adenine dinucleotide and flavin adenine dinucleotide, better and more simply known as NAD and FAD. These two molecules are electron carriers and they pick up and leave off their electrons through redox reactions. Their oxidized forms are NAD+ and FAD and they are reduced to NADH and FADH2.

NAD is a dinucleotide, meaning it is composed of two nucleotides, which are joined by a phosphate group. One nucleotide has an adenine base, the other, nicotinamide.

NAD molecule by "NEUROtiker" via Wikimedia Commons

NAD molecule by “NEUROtiker” via Wikimedia Commons

During cellular respiration (explained later), a molecule referred to as the substrate gives up two H atoms, bringing about the reduction of NAD+ in the following way, where R means “residue” and indicates a substrate:

RH2 + NAD+ → NADH + H+ + R

Ignoring R on both sides

NAD+ + 2H → NADH + H+

or, in more detail,

NAD+ + 2H → NAD+ + 2e + 2H+ → NADH + H+

which shows more clearly that the NAD+ has absorbed both electrons and one proton, thereby gaining electrons and being reduced. In a later step, both the H atoms will be used for energy transfer and the NADH will be re-oxidized to NAD+. In this way, NAD transports electrons from one reaction to another.

The equivalent formula for the reduction of FAD goes in two steps:

FAD + e + H+ → FADH

FADH + e + H+ → FADH2

to make

FAD + 2H → FAD + 2e + 2H+ → FADH2

which shows that FAD is reduced to FADH2.

Now we are ready to look at cells, the basic units of life.


Notes   [ + ]

1. RNA can also function as an enzyme and it is not a protein.
2. In fact, glucose, galactose and fructose all have the same formula, C6H12O6, but differ in their conformations. Similarly, ribose and deoxyribose share the same formula, C5H10O5, but different conformations.
3. Or, at least, recently. It’s hard to keep up with what nutritionists tell us.
4. Common usage employs the term nucleotide for those with more than one phosphate group.
5. A coenzyme is a non-protein compound that is necessary for the functioning of an enzyme. Enzymes are macromolecular catalysts, most of which are proteins.

What biochemistry and cellular biology tell us

We have seen how the universe grew from a tiny point to become the enormous – probably infinite – place we see about us. We have focused on a small part of this huge entity and have seen how our solar system has formed and then our planet; how the Earth evolved to reach its current – but temporary – state of support of life; when life was born and how it evolved from bacteria to plants and marine creatures, then land creatures like dinosaurs, then mammals and primates and – currently – us.

So now what? Well, there’s us. But a complete study of that subject is well beyond the domain of this document, so let’s concentrate on a limited subset of it. As a former physicist and informaticien, and so naturally interested in energy and communications, I propose to follow those two threads in studying the human body. That, at least, is the goal. This route should lead to the ultimate and most subjective-seeming domain, cognitive science – the study of the brain.

We must start small, though, with cells, as all else follows from them. And in order to understand them, we need to know

Osmosis and buffering

Diffusion and osmosis

Collective or intensive properties, which are independent of the amount of a substance, like pressure or boiling and melting points are called colligative properties. The concentration of a solute is such a property. A solution wants to have the same concentration of solute everywhere, as this represents the state of highest entropy (randomness). So in case of non-equilibrium, a solute will migrate from any region of higher concentration to regions of lower concentration — just like heat energy flows from hotter to cooler, and for similar reasons. When both regions are mixed and at the same concentration, the result is less ordered and so of higher entropy. This is diffusion.

Another very important colligative property is osmotic pressure. This is only a bit tricky to understand.

Normally, one expects a solute to diffuse from a region of higher concentration to one of lower concentration in order to bring about equal concentrations of the solute. But if the two regions are separated by a membrane which the solute cannot cross but the water can, then the opposite happens. Water flows from the region of lower concentration, i.e., where there is less solute, to the region of higher concentration, which has the effect of diluting the latter and lowering its concentration. At the same time, the solute concentration on the other (source) side goes up. This process is osmosis.

So, in diffusion, the solute migrates; in osmosis,the  solute cannot cross the membrane, so water migrates.

The concentration of solute depends not on its mass but only on the number of atoms or ions. The water is driven by a force, a pressure, which is called osmotic pressure.

If the membrane is a cell membrane, then water flows into or out of the cell, depending on the solute concentration inside and outside. Cells usually have a higher solute concentration of biomolecules inside than out, which drives water into the cells. This could cause the cell to expand until it exploded, but nature has come up with mechanisms to prevent this catastrophe, including reinforcement of cell walls and pumps to remove water from the cell.

In plants, osmotic pressure stiffens cells with reinforced cell walls, so the plant can stand up. The opposite thing happens when a salad leaf wilts.

"Turgor pressure on plant cells diagram" by LadyofHats - did it myself based on [1], [2] ,[3] and [4].. Licensed under Public Domain via Wikimedia Commons.

Turgor pressure on plant cells by LadyofHats via Wikipedia.

Buffering — acids and bases

Water is naturally somewhat ionized.

Water ionization

Auto-ionization of water, by Cdang via Wikipedia

There are no free protons in water (even though we will write them as such), hence the hydronium ion, H3O+, with an extra proton. An acid is defined as a proton donor (furnishes H+) and a base as a proton acceptor (consumes H+), so it can be seen that water is weakly both: H3O+ is a donor and OH is an acceptor. The degree of acidity is frequently indicated by the pH value, where

pH = log(1/[H+]) = -log([H+])

where [H+] is the concentration of H+. Water at 25°C has a pH of 7; ph < 7 means more H+ and therefore more acidic; pH > 7 means basic. Like all such chemical transformations, there is an equilibrium point for the above reactions. This is also true for any other weak acid dissolved in water. Consider acetic acid,


which occurs in an equilibrium state of acetic acid itself (an acid, therefore a proton donor) and CH3OOO (a base, or proton acceptor). These two substances constitute a conjugate acid-base pair. When this weak acid is dissolved in water, two equilibria must be established at the same time, for water and for acetic acid, here represented simply as HAc.

H2O <-> H+ + OH

HAc <-> H+ + Ac

Now if we add a small quantity of a base, say NaOH, to this solution, the base will decompose into Na+ and OH, the latter a proton acceptor or base. It will combine with some of the protons from the water and acetic acid. But then the acetic acid will be out of equilibrium, so it will produce more free protons to re-establish its equilibrium, thereby attenuating the effects of the added NaOH. This ability to reduce induced acidity is called buffering. A buffer is an aqueous system which resists changes in acidity from a small amount of added base or acid. It is composed of a weak acid and its conjugate base. It is the mechanism by which the human body adjusts the acidity of cells.

If body acidity is not within rather strict limits, enzymes will not function and so neither will we. The body uses a buffer system based on the conjugate pair carbonic acid and bicarbonate:

H2CO3 <-> H+ + HCO3

If blood acidity starts to become too high, bicarbonate leaps in and absorbs protons. If it becomes too low, carbonic acid supplies them. This is one of many regulative mechanisms the body has for maintaining the proper equilibrium of certain solutions and processes needed by the body in order to stay alive. We will see more.

The global water cycle

Let us briefly leave the microscopic considerations of chemistry and look at water on the scale of the Earth. Water circulates through the ground, streams, oceans and lakes and the atmosphere in what is called the water cycle.

The water cycle, from USGS

The water cycle, from USGS

This is just one of a number of transformational processes which assure the distribution of an essential component of life on Earth. The diagram is pretty much self-explanatory.

That’s it for the introductory material. Now let’s look at the history of it all. That starts in the past. Way back in the past, about 13.7 billion years ago (Gya).

Cheat sheet

Some generally useful information you may want to look up occasionally.

Geological time scale, eons, eras, periods and epochs

Geological time scale and

Geological time scale — red lines are mass extinctions, past or to come…

Types of hominins

Timeline and grouping of principal fossil hominid species

Timeline and grouping of principal fossil hominid species

Biological species classification

Classification of modern humans and house cats, after Wikipedia

Classification of modern humans and house cats, after Wikipedia

The periodic table of the elements

Periodic table of the elements

Periodic table of the elements

Particles of the standard model

Standard Model particle zoo

Standard Model particle zoo

Hominoid clades

Hominoid families with dates.

Hominoid families with dates.

Phylogenetic tree

Phylogenetic tree By MPF [Public domain], via Wikimedia Commons

Phylogenetic tree By MPF [Public domain], via Wikimedia Commons



Now we are ready to understand how it is that carbon is such a versatile element. It is at the basis of all organic chemistry and, in particular, biochemistry. In living organisms, the four most abundant elements by number of atoms are hydrogen, oxygen, nitrogen and carbon. Together, they comprise over 99% of the mass of most cells.

We saw that the carbon atom’s electron-shell configuration was

12C: 1s22s22p2

so it has four electrons in its valence shell (n=2). That enables it to share four electrons with four others from other atoms. The bonds tend to be equally spaced around the carbon atom in the form of a tetrahedron, like those little creamer packets you get in cheap restaurants. A carbon atom can bond with four hydrogens, sharing each of its four valence electrons with one hydrogen, so each hydrogen has two and the carbon has a total of eight and everybody is happy. This is called methane and looks like this.

"Methane-2D-stereo" by SVG version by Patricia.fidi - Own work. Licensed under Public Domain via Wikimedia Commons.

Methane molecule, CH4 by Patricia.fidi via Wikimedia Commons.

You should see one of the lower-right-hand hydrogens as pointing up out of the page; the other, down into it. The angles between any two adjacent connecting lines (which of course are only imagined by us) are about 109.5°. Carbon’s versatility in binding is illustrated by the examples in this diagram.

Versatiliy of carbon bonding, after Lehninger.

Versatility of carbon bonding, after Lehninger.

The dots represent valence electrons and the right-hand column is another way of looking at the product in terms of bonds rather than electrons. Each line between atoms is a shared pair of electrons. Note the double and triple inter-carbon bonds in the last two examples. This large number of ways of bonding is the key to carbon’s versatility. In fact, compared to the huge number of such molecules possible, only a relatively small number of the same biomolecules occur in living organisms. This is the first example we see of nature using the same set of techniques or tools all over the biosphere.

Single bonds between carbons also exist, of course, and have the particular advantage that the carbons and whatever is bonded to them can rotate around the axis linking the two carbons. This is more important than one might think. It turns out that some proteins function differently in their left-handed and right-handed versions. Since rotation can change the shape of the molecule, this enables biomolecules with hundreds of atoms to take on specific shapes with definite mechanical or fluid properties. (We will see some of this in the biochemistry chapter.)

The importance of water is not just because we drink it. Let’s go look at that.


Water is tremendously important to us if only because around 70% of the globe is covered with it, Each of us is 55-75% water (by weight) and life most likely arose in water. Two properties of water are of fundamental importance for biochemistry and, therefore, for life.

  • the attractive force between water molecules and
  • the tendency of water to ionize slightly.

Polarization and hydrogen bonds

As we saw, the electron configuration of oxygen’s eight electrons is:

16O: 1s22s22p4

So it needs two more electrons in order to fill its valence shell. As everybody knows, it bonds with two atoms of hydrogen to make H2O. Each hydrogen atom shares its electron with the oxygen, making two covalent bonds. The oxygen atom now has the desired eight electrons in its valence shell. The resulting arrangement is triangular.

In addition, oxygen is more electronegative1)Electronegativity depends on the number of electrons and on the distance of the valence electrons from the nucleus. than hydrogen, meaning it has a stronger attraction for electrons, so the electrons spend more time in the vicinity of the oxygen, making that end of the molecule slightly more negative. The molecule is said to be polarized.

Polarization of water molecule, <a href="">Wikimedia Commons</a>

Polarization of water molecule, by via Wikimedia Commons

Since one end of the molecule is more negative than the other, the negative end of one molecule is electrostatically attracted to the positive end of another and this forms a weak bond called a hydrogen bond. In this image, one sees the proposed tetrahedral form of the molecule as well as the hyrdrogen bonds between molecules.

Model of hydrogen bonds between water molecules, from Wikimedia Commons

Model of hydrogen bonds (dashed lines) between five water molecules, by Qwerter via Wikimedia Commons

Hydrogen bonds are strongest when the electrostatic interaction of the participant atoms is maximized, as shown in the above figure. This directionality is responsible for the geometric structure of hydrogen-bonded molecules into crystals.

Hydrogen bonds also form between an electronegative atom and a hydrogen atom covalently bonded to another electronegative atom, be it the same or different.


"Base pair GC" by Yikrazuul - Own work. Licensed under Public Domain via Wikimedia Commons.

Hydrogen bonding (dashed lines) between guanine and cytosine, two of the four types of base pairs in DNA, by Yikrazuul via Wikimedia Commons.

Hydrogen bonds are much weaker than covalent bonds, typically on the order of a twentieth. But when there are many of them, their combined strength can be great indeed. A striking example is DNA, in which the opposing strands are held together by hydrogen bonds between the bases. But more on that later.

If the molecules are rushing about (as in water), they are relatively independent and the substance is a liquid. The hydrogen bonds are constantly formed and broken, forming so-called “flickering clusters”. Heat them more and they separate entirely and the water becomes a gas — water vapor. The hydrogen bonds between molecules hold them together pretty well, though, and this accounts for the rather high boiling temperature of water. Chill them down to a temperature where they do not move much any more and the hydrogen bonds assemble the molecules into a solid lattice or crystal — ice.

"Hex ice" by NIMSoffice (talk). via Wikipedia Commons

Ice configuration by NIMSoffice via Wikipedia Commons. Dashed lines are hydrogen bonds.

Ionization, hydrophobic and hydrophilic molecules

Because of its polarization, water can pull apart polar molecules, such as table salt, NaCl, where the positive sodium Na+ is attracted by the negative end of the water molecule and the negative chlorine Cl by the positive end. This is what makes water a good solvent. One can see the advantage of this from another angle. Remember entropy? Nature wants higher entropy, meaning more disorder. But NaCl forms a highly ordered crystal structure. When the molecules are pulled apart in water, a more disordered state is achieved and entropy increases. Voilà!

Non-polar molecules are not soluble. They are called hydrophobic, because they do not “like” water. NaCl likes it and so is called hydrophilic. This has some amazing and important consequences.

The behavior of solvents in aqueous solutions is a very important subject in biochemistry — and a fairly vast one. Let us look at one interesting and essential type of compound: Ampiphatic compounds have some regions that are polar or charged, therefore hydrophilic, and others that are not polar or charged and so are hydrophobic. In the figure below, we consider molecules illustrated as having a green hydrophilic head and long, yellow hydrophobic tails. When they are dissolved in water, the hydrophobic parts flee the water and tend to group together (like people grouped together facing outwards in the midst of a pack of threatening wolves), leaving the hydrophilic parts on the outside turned towards the water. The result is a spherical blob called a micelle.

Micelle schema by SuperManu via Wikimedia Commons.

Micelle schema by SuperManu via Wikimedia Commons.

One can understand this from thermodynamics too. The water molecules are highly ordered around the hydrophobic parts of the ampiphatic molecule. Hiding these on the interior of the micelle reduces the ordering and therefore represents a state of higher entropy.

And there is another possibility. Think of the micelle opened up, like an orange, and flattened out and another one put alongside it, so that the hydrophobic ends are against each other and isolated from the water by the hydrophilic ends on the outside, as in part 1 of this diagram.


Lipid bilayer and micelle by Stephen Gilbert via Wikipedia Commons

This ampiphatic substance could be a lipid (organic fat), in which case this is a lipid bilayer, which is what forms cell membranes. So we are ready to start looking at cells in the physiology chapter. And all that is due to electrostatics, QM and thermodynamics — it’s all simple physics.

There are a couple more, slightly more complex, attributes of water we should know about. They are the important ideas of osmosis and buffering.

Notes   [ + ]

1. Electronegativity depends on the number of electrons and on the distance of the valence electrons from the nucleus.

Atomic energy levels and chemical bonding

Atomic structure is the basis of chemistry. It is explained by Quantum Mechanics, which is part of physics. We will see that physics explains chemistry, which explains physiology, which at least start to explain neurobiology. It’s one thing that leads to another.

In QM, the properties of a system, that is, a given object or set of objects, such as an atom, are given by the solution to the Schrödinger equation for the system. For atoms, there are a set of solutions, corresponding to different energy states of the atoms. What follows may smack of numerology.

Consider the hydrogen atom, composed of one negatively-charged electron in orbit around a nucleus containing one positively-charged proton. (This is an experimental result.) Look out, the orbit is not a well-defined path around the nucleus like those animations you see in TV ads, but rather a cloud of probability indicating the likelihood that the electron will be found at any given point in the cloud. This is due to the probabilistic character of QM. The different solutions to the Schrödinger equation express among other things the possible energy values of the atom. It is specified by a set of integer numbers called quantum numbers. In the case of atoms, they are the following:

  1. The principal quantum number, designated by the symbol n, takes on integer values from 1 on up, but in practice only to 7. It indicates the shell, or level of the cloud, in which the electron is found. The values 1-7 are often indicated by the letters K, L, M…Q.

  2. The orbital quantum number, l, indicates a level within the shell which is called the subshell, It can take on values from 0 up to n-1. The values 0-3 are often referred to as s, p, d and f. (It is really of no importance, but s, p, d and f are abbreviated forms of sharp, principal, diffuse and fundamental.)

  3. The orbital magnetic quantum number, m, refers to the magnetic orientation of the electron. It can range from -l up through +l.

  4. The electron spin, ms, can take on only two values, ½ or -½.

So the only allowed values for the quantum numbers are

n = 1, 2, 3, …

l = 0…n-1 (for a given value of n)

m = -l…+l (for a given value of l)

because those are the ones for which the Schrödinger equation has solutions. It is that simple.

The QM exclusion principle forbids two electrons to occupy the same state. So each set of values (n, l, m, ms) can only correspond to one electron. The result is illustrated in the following table.

n (shell)

l (subshell)

m (orbital)

Max no. electrons

1 0 0 2
2 0
3 0
-1, 0, 1
-2. -1, 0, 1, 2
4 0
-1, 0, 1
-2, -1, 0, 1, 2
-3, -2, -1, 0, 1, 2, 3

The fact that the quantum numbers do not vary continuously from, say, 0 to 0.001 and then 0.002 and on, but but jump from one integer value to another means that the energy of the electron in the electric field of the nucleus also takes on non-continuous values. These are called quantum states and are a feature, or if you prefer, a peculiarity, of QM.

The chemical properties of an atom depend only on the number of electrons. This is equal to the number of protons and is called the atomic number. All atoms except hydrogen have nuclei which also contain neutrons. The table summarizes the allowed values of quantum numbers for the first four shells.

In specifying which subshells are occupied by the electrons in an atom, one often uses the format


where l is specified as s, p, d or f and # is the number of electrons in the subshell. In its minimum energy state, called the ground state, the carbon atom (atomic number = 12, nucleus contains 6 protons and 6 neutrons) has the following electron configuration:

12C: 1s22s22p2

which indicates the maximum number of two electrons in shell 1, again in subshell s of shell 2 and the remaining two in subshell p of shell 2. Similarly, oxygen (atomic number = 16, 8 each of protons and neutrons) is

16O: 1s22s22p4

the meaning of which should now be clear.

What is interesting is that, for energetic reasons, each atom would like to have its outside subshell filled. If a few electrons are missing, it wants more; if most are missing, it might be willing to give up the rest in order to have an empty outside shell, referred to as the valence shell. (The number of electrons in this outer shell is called the valence.1)Officially, the maximum number of univalent atoms (originally hydrogen or chlorine atoms) that may combine with an atom of the element under consideration, or with a fragment, or for which an atom of this element can be substituted.) For instance, hydrogen

1H: 1s1

wants two electrons or none in its 1s shell, so it could give up its electron or gain one. What happens is, two H atoms share their electrons to make a molecule of H2, so each has two electrons half the time. Better than nothing.

Since oxygen already has shell 2 half-filled, it would probably prefer to gain electrons to fill it. And carbon… but carbon is special and will be considered in a moment.

Look at sodium (Na, atomic number 11) and chlorine (Cl, atomic number 17):

Na: 1s22s22p63s1

Cl: 1s22s22p63s23p5

Sodium could happily give up that 3s electron and chlorine could use it to fill up its 3p valence shell. And this is what happens in table salt, NaCl. But if you put salt in water, it separates (for reasons which will be discussed shortly) into charged ions, Na+ and Cl, because chlorine is greedy and keeps that negative 3s electron it took away from sodium. This attraction for electrons is called electronegativity.

Chemistry is the study of chemical systems (atoms, molecules) and chemical bonding between such objects. In the case of NaCl, the sodium and chlorine have opposite electrical charge and the attractive electric force is what holds the molecule together. This is called ionic bonding. Sometimes, when atoms cannot decide which has more right to an electron, the electron is shared between them, as in H2, making both atoms relatively happy. Bonding based on shared electrons is called covalent bonding; it is a sort of consensus situation, if we go on with the anthropomorphism.

Elements with the same number of electrons in their outer shells have similar chemical properties. So they are arranged in columns in that wonderful physical/chemical tool, the periodic table of the elements.

Periodic table of the elements

Periodic table from Wikimedia Commons

It is easy to see that the elements in the first column are like hydrogen in having one electron in their valence shell.

H: 1s1

Li: 1s22s1

Na: 1s22s22p63s1

K: 1s22s22p63s23p64s1

… and so on.

The extra elements in the middle are rule-breakers. Instead of filling one subshell before moving on to the next, they start one, add a small number (often only one) of electrons to the next, then go back to finish filling the next-to-last.

Columns in the table are called groups; rows, periods.

The subshell configurations we have been giving are for the lowest energy state of the atom, called the ground state, in which subshells are filled from the “bottom” up (with some exceptions, as just mentioned). But if that hydrogen atom is struck by a photon, enough energy may be transferred from the photon to the atom to push that 1s electron into a higher-energy subshell. The atom is then said to be in an excited state. The electron may then re-descend spontaneously to the lower subshell, emitting a photon of energy equivalent to the difference in energy levels of the subshells. In QM, photons behave like waves whose energy is a function of their frequency, so the frequency – equivalently, the color – of the light emitted is characteristic of the difference in energy of the two subshells. Any atom’s subshells will therefore correspond to a given set of photon frequencies emitted and these are seen as colors, although not all these colors will be visible to a human eye. The set of frequencies constitute the spectrum of the atom and may be used to analyze the identity of a light source. In this way, we can identify the chemical components of light-emitting objects like distant stars.

There are two other types of bonding. We will consider hydrogen bonds very shortly in the discussion of water. The fourth form is due to the shifting electron density distribution around an atom. At times, this may form a temporary dipole even in a neutral atom. This may in turn induce a dipole in nearby atom in such a way that the two dipoles attract each other very weakly. This is London, or van der Waal’s, bonding.

The functioning of all living things depends on water and on the versatility of the carbon atom. So let’s start with carbon.

Notes   [ + ]

1. Officially, the maximum number of univalent atoms (originally hydrogen or chlorine atoms) that may combine with an atom of the element under consideration, or with a fragment, or for which an atom of this element can be substituted.

What atomic physics and chemistry tell us

I am, reluctantly, a self-confessed carbon chauvinist. Carbon is abundant in the Cosmos. It makes marvelously complex molecules, good for life. I am also a water chauvinist. Water makes an ideal solvent system for organic chemistry to work in and stays liquid over a wide range of temperatures. But sometimes I wonder. Could my fondness for materials have something to do with the fact that I am made chiefly of them?

– Carl Sagan, Cosmos

The early stages of the universe and the lives of stars are the matter of physics and astronomy and their offspring, astrophysics and cosmology. By the time the first living things showed up on Earth, processes were occurring which require our knowing about the phenomena described by the science of chemistry. QM is the basis of atomic physics and that is the basis of chemistry, so we are ready for it.

To do even begin a comprehensive survey of chemistry is well beyond the scope of this document. We will illustrate its usefulness and some of its fruits by considering two subjects of great importance not only to Carl Sagan but to all of us – carbon and water.

In order to do that, it is necessary to know about several sujects:

Then we move on to consider, first, the past, starting almost 14 Gya.