It hasn’t been long since the discovery of DNA, as an inheritable
database of all living beings, created a great deal of enthusiasm. This
enthusiasm is anything but further boosted in the post-genomic era where
sequencing of a genome is as routine a task as building a bridge. The
expectations and prospectus from the post-genomic science are enormous; ranging
from finding person-specific cure to diseases, to obtaining better crops and to
industrial biotechnology where environment-friendly processes will replace
traditional petrochemical based synthesis. Apart from several significant
developments towards fulfilling these promises, new scientific challenges are
surfacing from the vast amount of biological data being generated in the
post-genomic era. Traditional (and rather naïve) ideas of linking one gene to
one disease, one gene to one product or in general, one gene to one phenotype,
are rapidly being turned down in increasing number of cases. These findings, together
with increased understanding of cellular organization (protein-protein
interactions, transcriptional regulatory circuits etc.), has lead to the modern
discipline of molecular systems biology, where not only the components (genes,
proteins, metabolites etc.) are deemed important, but also the interactions
among them.
Genes
and Links- Interactome
Significance of connections has long been recognized in biology, albeit
at macroscopic level. Studies in ecology and human social sciences almost
completely rely on the properties of interactions between individuals or
species in a social or ecological network. It is only in the post-genomic era
that these network-centered concepts are being
formalized at the molecular level. The cellular image originating from the
central dogma in molecular biology, where a gene is sequentially linked to a
functional protein, is being rapidly expanded to account for the functions
arising due to various bio-molecules interacting with each other (Figure 1).
 |
| Figure 1. Interactome view
of the central dogma in molecular biology. Interactions between
different bio-molecules play an important role in systems biology approach.
Figure shows some of the molecular interactions that are crucial for the
cellular functioning, viz., Protein-DNA, Protein-Protein, Protein-complexes,
Metabolic and functional associations between genes. Together, these
interactions form interactome, analogous to the
genome, transcriptome, proteome and metabolome. |
A most common example of bio-molecular interactions is enzyme complexes
where two (or more) protein sub-units must assemble together so as to form a
functional enzyme. For most proteins, thus, it takes two (or more) to tango.
Many enzymes and other proteins also interact with small molecules, usually
nutrient molecules or toxic products and in effect, their properties are
changed in response to the concentration of these ‘effector molecules’. Protein-DNA interactions are another important class of
interactions where expression of the genes is controlled by the presence (or
absence) of regulatory proteins binding upstream of the coding region.
Together, protein-protein and protein-DNA interactions constitute cellular
regulatory circuits that orchestrate the expression of genes and replication of
the DNA. Metabolic networks represents another important cellular network,
where different substrates are broken down by synergistic work of hundreds of
different enzymes in order to create energy and basic building blocks for
growth (such as amino acids, nucleotides etc.). Assembly of these building
blocks into proteins, DNA and other bio-polymers will also not be possible
without the multi-component ‘assemblers’ such as ribosome. All in all, the
organization and the functioning of the entire cell can be viewed as a network
of molecular interactions- the interactome. This
network view of the cell forms the core of the systems biology research.
Bio-molecular interaction networks are often complex, owing to the large number
of components and interactions. Metabolic network is a good and well-studied
example of a complex bio-molecular network. Figure 2 shows a large part of the
recently compiled human metabolic network(1) where
enzymes and metabolites are represented as nodes and associations between them
as edges.
 |
| Figure 2. Graph theoretical
representation of the part of the human metabolic network(1).
Metabolites are shown as yellow nodes while the enzymes are shown as red/green
nodes, depending on up/down-regulation of the corresponding genes in the
skeletal muscles of diabetes patients(11). |
Emergent
properties and robustness
What does the network-view of the cell add to our understanding? Is it
not enough to know the properties of individual proteins in order to understand
the function of a regulatory circuit? Like a football team, the success of a
cellular circuit/network not only depends on the quality of the individual
players, but also on how well these players interact with each other. Network
functionalities that thus “emerge” due to the interactions among the nodes are
termed as the emergent properties. Robustness of cellular networks towards
failure of its components is perhaps the most studied emergent property of
large biological networks.
Metabolic network (for example, see Figure 2) is one such example.
Several of the genes involved in metabolism are dispensable(2,
3). Thus, removal of these enzymes from the network doesn’t severally affect
the operation. This property can be roughly visualized in networks as the
integrity of the network in response to removal of its nodes. One of the
striking features of the network depicted in Figure 2 is the presence of few
“hubs” or highly connected nodes, while rests of the nodes have relatively low
connectivity. Interestingly, this feature, often referred to as scale-free
topology, is characteristic of many cellular networks. In a scale-free network
highly connected nodes keep the network together and thereby create a small
world where no two nodes are very far from each other(4). For example, in
metabolic network of yeast S. cerevisiae, two enzymes
are on average less than 3 nodes apart. Thus, signal/disturbance/information
can potentially travel quite rapidly across the whole network.
Scale-free property has also been observed in the network of social
interactions and electrical grids among others. Since most of the nodes in a
scale-free network are not-so-well connected, their removal does not affect the
overall connectivity of the network. On the other hand, scale-free networks are
more vulnerable to the targeted attacks on the hubs(5).
Natural competition between the host and bacteria is often a game of finding
and defense of hubs.
Biological
networks in space and time
Another fascinating aspect of biological networks is their dynamic
nature and spatial segmentation inside the cells. A study(6)
by Søren Brunak’s group at
DTU has shown that many protein-protein interactions in yeast are dynamically
established and broken during the cell cycle. These dynamics of protein
interactions regulate the cellular machinery in harmony with the most
fundamental task of DNA replication. Indeed, synchronization with the cell
cycle has also been observed for the expression of about half of the yeast genome(7). Part of this regulation is aimed at maintaining
low respiratory activity (and hence low reactive oxygen species that may damage
DNA) during DNA replication. Spatial distribution of different reactions
between mitochondria, cytosol etc. also marks the
necessity of accounting for space and time effects while simulating the
cellular behavior in a holistic fashion.
Transcriptome, Proteome and Metabolome
Although many general properties of bio-molecular interaction networks
can be studied by simply looking at the connectivity, knowledge about the
levels of individual components and the ‘strength’ of connections between them
is inevitable if we wish to obtain a quantitative description of the network
operation. Several experimental techniques are being developed towards this
end. Measurement of the expression level for all genes in a genome has become a
routine in biological research. Several high quality assays exist where whole
genome profiling of mRNA expression can be accomplished in a single-shot
experiment. Such compendium of genome-scale expression data is referred to as transcriptome.
On the contrary, measurement of proteome (all proteins in a cell) and metabolome (all metabolites in a cell) is still a
challenging task for analytical bio-chemistry. Nevertheless, recent
developments in experimental techniques have enabled the measurement of few
thousands of proteins in the baker’s yeast Saccharomyces cerevisiae(8).
Large amount of metabolites have been successfully identified and, to some
extent, quantified in fungi and plants(9). Together, transcriptome, proteome and metabolome provide a snapshot of cellular operations under given conditions. These omics technologies are thereby helping in better
understanding of the mechanisms of the flow of information from the genes down
to the phenotype.
Towards
integrative analysis and culture in science
Only for the yeast S. cerevisiae, more than
1500 genome-wide transcriptional datasets (~6000 genes in a single dataset) are
available in public domain (for example see GEO datasets at NCBI,
http://www.ncbi.nlm.nih.gov). A major challenge in systems biology is to distill such large amount of data into biologically
meaningful hypothesis and new inventions. Several mathematical and algorithmic
tools rooted in statistics, machine learning and computer science are being
developed to tackle the omics data analysis problems.
As a necessity and outcome of this process, scientists from disciplines as
diverse as mathematics, computer science, physics, micro-biology, genetics,
molecular biology and chemical engineering are being gathered under the roof of
systems biology. It is also expected that this multi-disciplinary groups will
bring several new concepts and tools to the field of biology and human health
research.
Systems
biology for industrial biotechnology
The chemical manufacturing industry is actively searching for
cost-effective, environmentally friendly, and sustainable raw material feedstocks that will not only enable production of key
chemical building blocks, but can serve as a platform for future products. Industrial biotechnology is a promising
alternative where microbial organisms engineered to produce key products from
biomass (i.e., any organic polymeric material resulting from photosynthetic
fixation, such as corn, wood, or sugar cane). Micro-organisms are offering
distinct advantages as cellular factories, not only for high-value products
(antibiotics, therapeutic proteins, nutraceuticals etc.), but also for the commodity chemical production.
 |
| Figure 3. Competition
between growth and product formation in microbial industrial biotechnology. Substrate consumed is distributed between the product and biomass based on the
environmental conditions and the genetic make-up of the micro-organism. Due to
this inherent competition, which is often in the favor of biomass due to natural evolutionary reasons, cellular network needs to be
retrofitted at genetic level. Identification of such metabolic engineering
targets needs systems approach owing to the complexity of the cellular
metabolic networks. |
A major challenge in the use of micro-organisms as cell factories is to
improve the productivity and yield of desired products by engineering the
cellular machinery. There is often a competition between the micro-organism’s
objective of high-growth and the formation of desired product. To win this
competition in the favor of the product, it is often
necessary to retrofit the microbial DNA (Figure 3). Through genetic engineering
it is possible to introduce targeted genetic changes and hereby engineer
microbial cells with the objective to obtain desirable productivity
improvement. However, owing to the complexity of cellular networks, both in terms
of structure and regulation, it is often difficult to predict the effects of
genetic modifications on the resulting phenotype.
Recently genome-scale metabolic models have been compiled for several
different microorganisms where structural and stoichiometric complexity is inherently accounted for. New algorithms are being developed by
using genome-scale metabolic models that enable identification of gene knockout
or gene addition strategies for obtaining improved productivity. Efficient
algorithms and large computer power has enabled us to carry on gene
deletion/addition experiments in silico(10). The production of a desired compound can be thus
optimized in silico by identifying the best candidate
genes for deletions. As a case study at
the Center for Microbial Biotechnology-DTU, we have
successfully realized the computer-predicted strategies in vivo for production
of Succinic acid, an important commodity chemical.
Systems
biology in disease research
Most common diseases are heterogeneous, that is originating from
disruptions of several parts of the underlying biological processes, which in
turn disrupts the interaction among various parts of the cellular system and
thereby compromises one or more of the system’s functions (i.e., one or more of
the system’s emergent properties). Type 2 Diabetes (DM) is an example of such a
heterogeneous disease. During the last years more systemic modular approaches
have been used to unravel the underlying disorder of DM.
One approach is to integrate a bio-molecular interaction network model
with genome-scale transcriptome (or other omics) data. In this approach, the interaction network
serves as a scaffold for data integration. The changes in gene expression can
then be mapped on the nodes of the network model to reveal patterns of
coordinated changes. In contrast to the reductionist approach where each gene
is individually evaluated for the change in expression level, the systems
biology approach evaluates the significance of changes of a gene only in
connection with the changes in the related genes.
Consider an example where gene expression data from skeletal muscles
between diabetic and non-diabetic patients is compared(11).
If the expression of a particular gene that catalyzes for enzyme encoding NADH
dependent reaction is changed, it is difficult to evaluate the source or
biological objective of this change. However, if we observe that all enzymes
involved in NADH associated reactions show changes in their expression, we can
easily hypothesize the biology underlying the differences between diabetic and
non-diabetic patients, namely the respiration and redox metabolism. This simple but powerful approach is called ‘reporter-metabolites’ analysis(12). Most of the systems biology approaches for
data analysis are based on parallel principles. Such integrative approach is a
major step towards a more holistic understanding of the complex heterogeneous
aspects of diseases and underlying physiological disorders.
An experimental quantitative systems approach is a key to the integrative
analysis of diseases. New quantitative approaches are allowing us to map not
only expression of different bio-molecules, but also differences between the
DNA of individuals linking to diseases(13). Linking
from this genotype to disease treatment at individual level is a major
challenge in the future of disease systems biology. This will require expansion
of the current network models and algorithms to incorporate data and
interactions from the DNA down to physiology, an assignment well on its way forward.
Conclusions
 |
| Figure 4. Depiction of an ancient Indian story where six blind men are trying
to understand the structure of an elephant by feeling different parts of an
elephant. The story conveys the idea that the true picture of reality
can only be perceived through the integration of all view points. |
Ultimate goal of systems biology is to create complete quantitative
models of cellular networks- virtual cells. To achieve this goal, it is
inevitable to start with simpler model systems such as micro-organisms.
Inevitable also is the use of multidisciplinary scientific teams and
integration of knowledge at different levels. Although current system models
are far from capturing the whole complexity of cellular networks in terms of
size, interactions, spatial-organization and dynamics; they certainly do
represent one of the first major steps towards creating virtual cells. The
opportunities and potential for these virtual cells are enormous, not only for
industrial biotechnology, but also for simulating the behavior of human cells and their response to different drugs and environmental
conditions.
Models of microbial metabolic networks are already offering us an
effective way to battle against the infectious microbes. With the pace of
current developments on the experimental and computational fronts, the dream of
virtual cells and personal cure for many complex diseases is perhaps not so
far. A key to achieve this goal is realization of the need for integration, of
knowledge, data and vision. Without integration of knowledge we risk to misunderstand
how living cells function; as in the ancient Indian story of six blind men and
an elephant (Figure 4).
Acknowledgements. We are grateful to
Manuel Quiros Asensio for
the sketch of ‘elephant with blind men’.
Reference
List
- Duarte, N. C., Becker, S. A., Jamshidi, N.,
Thiele, I., Mo, M. L., Vo, T. D., Srivas, R. & Palsson, B. O. (2007) PNAS 104, 1777-1782.
- Papp, B., Pal, C. & Hurst,
L. D. (2004) Nature 429,
661-664.
- Forster, J., Famili, I., Palsson, B. O. &
Nielsen, J. (2003) OMICS 7,
195-202.
- Fell, D. A. & Wagner, A. (2000) Nat. Biotechnol. 18,
1121-1122.
- Albert, R., Jeong, H. & Barabasi, A. L. (2000) Nature 406, 378-382.
- de Lichtenberg, U., Jensen, L. J., Brunak,
S. & Bork, P. (2005) Science 307, 724-727.
- Tu, B. P., Kudlicki, A., Rowicka, M. &
McKnight, S. L. (2005) Science 310, 1152-1158.
- Lu, P., Vogel, C., Wang, R., Yao, X. & Marcotte,
E. M. (2007) Nat Biotech 25, 117-124.
- Nielsen, J. & Oliver, S. (2005) Trends Biotechnol. 23,
544-546.
- Patil, K. R., Rocha, I., Forster, J. & Nielsen, J. (2005) BMC. Bioinformatics 6,
308.
- Patti, M. E., Butte, A. J., Crunkhorn, S., Cusi, K., Berria, R., Kashyap, S., Miyazaki, Y., Kohane,
I., Costello, M., Saccone, R. et al. (2003) PNAS 100, 8466-8471.
- Patil, K. R. & Nielsen, J. (2005) PNAS 102, 2685-2689.
- (2007) Nature 447,
661-678.
|