It hasn’t been long since the discovery of DNA, as an inheritable database of all living beings, created a great deal of enthusiasm. This enthusiasm is anything but further boosted in the post-genomic era where sequencing of a genome is as routine a task as building a bridge. The expectations and prospectus from the post-genomic science are enormous; ranging from finding person-specific cure to diseases, to obtaining better crops and to industrial biotechnology where environment-friendly processes will replace traditional petrochemical based synthesis. Apart from several significant developments towards fulfilling these promises, new scientific challenges are surfacing from the vast amount of biological data being generated in the post-genomic era. Traditional (and rather naïve) ideas of linking one gene to one disease, one gene to one product or in general, one gene to one phenotype, are rapidly being turned down in increasing number of cases. These findings, together with increased understanding of cellular organization (protein-protein interactions, transcriptional regulatory circuits etc.), has lead to the modern discipline of molecular systems biology, where not only the components (genes, proteins, metabolites etc.) are deemed important, but also the interactions among them.
Significance of connections has long been recognized in biology, albeit at macroscopic level. Studies in ecology and human social sciences almost completely rely on the properties of interactions between individuals or species in a social or ecological network. It is only in the post-genomic era that these network-centered concepts are being formalized at the molecular level. The cellular image originating from the central dogma in molecular biology, where a gene is sequentially linked to a functional protein, is being rapidly expanded to account for the functions arising due to various bio-molecules interacting with each other (Figure 1).
A most common example of bio-molecular interactions is enzyme complexes where two (or more) protein sub-units must assemble together so as to form a functional enzyme. For most proteins, thus, it takes two (or more) to tango. Many enzymes and other proteins also interact with small molecules, usually nutrient molecules or toxic products and in effect, their properties are changed in response to the concentration of these ‘effector molecules’. Protein-DNA interactions are another important class of interactions where expression of the genes is controlled by the presence (or absence) of regulatory proteins binding upstream of the coding region. Together, protein-protein and protein-DNA interactions constitute cellular regulatory circuits that orchestrate the expression of genes and replication of the DNA. Metabolic networks represents another important cellular network, where different substrates are broken down by synergistic work of hundreds of different enzymes in order to create energy and basic building blocks for growth (such as amino acids, nucleotides etc.). Assembly of these building blocks into proteins, DNA and other bio-polymers will also not be possible without the multi-component ‘assemblers’ such as ribosome. All in all, the organization and the functioning of the entire cell can be viewed as a network of molecular interactions- the interactome. This network view of the cell forms the core of the systems biology research. Bio-molecular interaction networks are often complex, owing to the large number of components and interactions. Metabolic network is a good and well-studied example of a complex bio-molecular network. Figure 2 shows a large part of the recently compiled human metabolic network(1) where enzymes and metabolites are represented as nodes and associations between them as edges.
What does the network-view of the cell add to our understanding? Is it not enough to know the properties of individual proteins in order to understand the function of a regulatory circuit? Like a football team, the success of a cellular circuit/network not only depends on the quality of the individual players, but also on how well these players interact with each other. Network functionalities that thus “emerge” due to the interactions among the nodes are termed as the emergent properties. Robustness of cellular networks towards failure of its components is perhaps the most studied emergent property of large biological networks.
Metabolic network (for example, see Figure 2) is one such example. Several of the genes involved in metabolism are dispensable(2, 3). Thus, removal of these enzymes from the network doesn’t severally affect the operation. This property can be roughly visualized in networks as the integrity of the network in response to removal of its nodes. One of the striking features of the network depicted in Figure 2 is the presence of few “hubs” or highly connected nodes, while rests of the nodes have relatively low connectivity. Interestingly, this feature, often referred to as scale-free topology, is characteristic of many cellular networks. In a scale-free network highly connected nodes keep the network together and thereby create a small world where no two nodes are very far from each other(4). For example, in metabolic network of yeast S. cerevisiae, two enzymes are on average less than 3 nodes apart. Thus, signal/disturbance/information can potentially travel quite rapidly across the whole network.
Scale-free property has also been observed in the network of social interactions and electrical grids among others. Since most of the nodes in a scale-free network are not-so-well connected, their removal does not affect the overall connectivity of the network. On the other hand, scale-free networks are more vulnerable to the targeted attacks on the hubs(5). Natural competition between the host and bacteria is often a game of finding and defense of hubs.
Another fascinating aspect of biological networks is their dynamic nature and spatial segmentation inside the cells. A study(6) by Søren Brunak’s group at DTU has shown that many protein-protein interactions in yeast are dynamically established and broken during the cell cycle. These dynamics of protein interactions regulate the cellular machinery in harmony with the most fundamental task of DNA replication. Indeed, synchronization with the cell cycle has also been observed for the expression of about half of the yeast genome(7). Part of this regulation is aimed at maintaining low respiratory activity (and hence low reactive oxygen species that may damage DNA) during DNA replication. Spatial distribution of different reactions between mitochondria, cytosol etc. also marks the necessity of accounting for space and time effects while simulating the cellular behavior in a holistic fashion.
Although many general properties of bio-molecular interaction networks can be studied by simply looking at the connectivity, knowledge about the levels of individual components and the ‘strength’ of connections between them is inevitable if we wish to obtain a quantitative description of the network operation. Several experimental techniques are being developed towards this end. Measurement of the expression level for all genes in a genome has become a routine in biological research. Several high quality assays exist where whole genome profiling of mRNA expression can be accomplished in a single-shot experiment. Such compendium of genome-scale expression data is referred to as transcriptome.
On the contrary, measurement of proteome (all proteins in a cell) and metabolome (all metabolites in a cell) is still a challenging task for analytical bio-chemistry. Nevertheless, recent developments in experimental techniques have enabled the measurement of few thousands of proteins in the baker’s yeast Saccharomyces cerevisiae(8). Large amount of metabolites have been successfully identified and, to some extent, quantified in fungi and plants(9). Together, transcriptome, proteome and metabolome provide a snapshot of cellular operations under given conditions. These omics technologies are thereby helping in better understanding of the mechanisms of the flow of information from the genes down to the phenotype.
Only for the yeast S. cerevisiae, more than 1500 genome-wide transcriptional datasets (~6000 genes in a single dataset) are available in public domain (for example see GEO datasets at NCBI, http://www.ncbi.nlm.nih.gov). A major challenge in systems biology is to distill such large amount of data into biologically meaningful hypothesis and new inventions. Several mathematical and algorithmic tools rooted in statistics, machine learning and computer science are being developed to tackle the omics data analysis problems. As a necessity and outcome of this process, scientists from disciplines as diverse as mathematics, computer science, physics, micro-biology, genetics, molecular biology and chemical engineering are being gathered under the roof of systems biology. It is also expected that this multi-disciplinary groups will bring several new concepts and tools to the field of biology and human health research.
The chemical manufacturing industry is actively searching for cost-effective, environmentally friendly, and sustainable raw material feedstocks that will not only enable production of key chemical building blocks, but can serve as a platform for future products. Industrial biotechnology is a promising alternative where microbial organisms engineered to produce key products from biomass (i.e., any organic polymeric material resulting from photosynthetic fixation, such as corn, wood, or sugar cane). Micro-organisms are offering distinct advantages as cellular factories, not only for high-value products (antibiotics, therapeutic proteins, nutraceuticals etc.), but also for the commodity chemical production.
A major challenge in the use of micro-organisms as cell factories is to improve the productivity and yield of desired products by engineering the cellular machinery. There is often a competition between the micro-organism’s objective of high-growth and the formation of desired product. To win this competition in the favor of the product, it is often necessary to retrofit the microbial DNA (Figure 3). Through genetic engineering it is possible to introduce targeted genetic changes and hereby engineer microbial cells with the objective to obtain desirable productivity improvement. However, owing to the complexity of cellular networks, both in terms of structure and regulation, it is often difficult to predict the effects of genetic modifications on the resulting phenotype.
Recently genome-scale metabolic models have been compiled for several different microorganisms where structural and stoichiometric complexity is inherently accounted for. New algorithms are being developed by using genome-scale metabolic models that enable identification of gene knockout or gene addition strategies for obtaining improved productivity. Efficient algorithms and large computer power has enabled us to carry on gene deletion/addition experiments in silico(10). The production of a desired compound can be thus optimized in silico by identifying the best candidate genes for deletions. As a case study at the Center for Microbial Biotechnology-DTU, we have successfully realized the computer-predicted strategies in vivo for production of Succinic acid, an important commodity chemical.
Most common diseases are heterogeneous, that is originating from disruptions of several parts of the underlying biological processes, which in turn disrupts the interaction among various parts of the cellular system and thereby compromises one or more of the system’s functions (i.e., one or more of the system’s emergent properties). Type 2 Diabetes (DM) is an example of such a heterogeneous disease. During the last years more systemic modular approaches have been used to unravel the underlying disorder of DM.
One approach is to integrate a bio-molecular interaction network model with genome-scale transcriptome (or other omics) data. In this approach, the interaction network serves as a scaffold for data integration. The changes in gene expression can then be mapped on the nodes of the network model to reveal patterns of coordinated changes. In contrast to the reductionist approach where each gene is individually evaluated for the change in expression level, the systems biology approach evaluates the significance of changes of a gene only in connection with the changes in the related genes.
Consider an example where gene expression data from skeletal muscles between diabetic and non-diabetic patients is compared(11). If the expression of a particular gene that catalyzes for enzyme encoding NADH dependent reaction is changed, it is difficult to evaluate the source or biological objective of this change. However, if we observe that all enzymes involved in NADH associated reactions show changes in their expression, we can easily hypothesize the biology underlying the differences between diabetic and non-diabetic patients, namely the respiration and redox metabolism. This simple but powerful approach is called ‘reporter-metabolites’ analysis(12). Most of the systems biology approaches for data analysis are based on parallel principles. Such integrative approach is a major step towards a more holistic understanding of the complex heterogeneous aspects of diseases and underlying physiological disorders.
An experimental quantitative systems approach is a key to the integrative analysis of diseases. New quantitative approaches are allowing us to map not only expression of different bio-molecules, but also differences between the DNA of individuals linking to diseases(13). Linking from this genotype to disease treatment at individual level is a major challenge in the future of disease systems biology. This will require expansion of the current network models and algorithms to incorporate data and interactions from the DNA down to physiology, an assignment well on its way forward.
Ultimate goal of systems biology is to create complete quantitative models of cellular networks- virtual cells. To achieve this goal, it is inevitable to start with simpler model systems such as micro-organisms. Inevitable also is the use of multidisciplinary scientific teams and integration of knowledge at different levels. Although current system models are far from capturing the whole complexity of cellular networks in terms of size, interactions, spatial-organization and dynamics; they certainly do represent one of the first major steps towards creating virtual cells. The opportunities and potential for these virtual cells are enormous, not only for industrial biotechnology, but also for simulating the behavior of human cells and their response to different drugs and environmental conditions.
Models of microbial metabolic networks are already offering us an effective way to battle against the infectious microbes. With the pace of current developments on the experimental and computational fronts, the dream of virtual cells and personal cure for many complex diseases is perhaps not so far. A key to achieve this goal is realization of the need for integration, of knowledge, data and vision. Without integration of knowledge we risk to misunderstand how living cells function; as in the ancient Indian story of six blind men and an elephant (Figure 4).
Acknowledgements. We are grateful to Manuel Quiros Asensio for the sketch of ‘elephant with blind men’.