# 21.5: Structural Properties of Networks - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Much of the early work on networks was done by scientists outside of biology. In this section we look at some of these structural properties shared by the different biological networks, as well as the networks that arise in other disciplines as well.

## Degree distribution

In a network, the degree of a node is the number of neighbors it has, i.e., the number of nodes it is connected to by an edge. The degree distribution of the network gives the number of nodes having degree d for each possible value of d = 1, 2, 3, . For example figure 21.3 gives the degree distribution of the S. cerevisiae gene regulatory network. It was observed that the degree distribution of biological networks follow a power law, i.e., the number of nodes in the network having degree d is approximately cd where c is a normalization constant and is a positive coefficient. In such networks, most nodes have a small number of connections, except for a few nodes which have very high connectivity.

This property –of power law degree distribution– was actually observed in many different networks across different disciplines (e.g., social networks, the World Wide Web, etc.) and indicates that those networks are not “random”: indeed random networks (constructed from the Erd ̋os-Renyi model) have a degree distribution that follows a Poisson distribution where almost all nodes have approximately the same degree and nodes with higher or smaller degree are very rare [6] (see figure 21.4).

Networks that follow a power law degree distribution are known as scale-free networks. The few nodes in a scale-free network that have very large degree are called hubs and have very important interpretations. For example in gene regulatory networks, hubs represent transcription factors that regulate a very large number of genes. Scale-free networks have the property of being highly resilient to failures of “random” nodes, however they are very vulnerable to coordinated failures (i.e., the network fails if one of the hub nodes fails, see [1] for more information).

(a) Scale-free graph vs. a random graph (figure taken from [10]) .

In a regulatory network, one can identify four levels of nodes:

1. Influential, master regulating nodes on top. These are hubs that each indirectly control many targets.
2. Bottleneck regulators. Nodes in the middle are important because they have a maximal number of direct targets.
3. Regulators at the bottom tend to have fewer targets but nonetheless they are often biologically essential!
4. Targets.

## Network motifs

Network motifs are subgraphs of the network that occur significantly more than random. Some will have interesting functional properties and are presumably of biological interest.

Figure 21.5 shows regulatory motifs from the yeast regulatory network. Feedback loops allow control of regulator levels and feedforward loops allow acceleration of response times among other things.

Figure 21.5: Network motifs in regulatory networks: Feed-forward loops involved in speeding-up response of target gene. Regulators are represented by blue circles and gene promoters are represented by red rectangles (figure taken from [4])

There are three major classes of hydrides—covalent, ionic, and metallic—but only covalent hydrides occur in living cells and have any biochemical significance. As you learned in Chapter 7 "The Periodic Table and Periodic Trends", carbon and hydrogen have similar electronegativities, and the C–H bonds in organic molecules are strong and essentially nonpolar. Little acid–base chemistry is involved in the cleavage or formation of these bonds. In contrast, because hydrogen is less electronegative than oxygen and nitrogen (symbolized by Z), the H–Z bond in the hydrides of these elements is polarized (H δ+ –Z δ− ). Consequently, the hydrogen atoms in these H–Z bonds are relatively acidic. Moreover, S–H bonds are relatively weak due to poor s orbital overlap, so they are readily cleaved to give a proton. Hydrides in which H is bonded to O, N, or S atoms are therefore polar, hydrophilic molecules that form hydrogen bonds. They also undergo acid–base reactions by transferring a proton.

### Note the Pattern

Covalent hydrides in which H is bonded to O, N, or S atoms are polar and hydrophilic, form hydrogen bonds, and transfer a proton in their acid-base reactions.

Hydrogen bonds are crucial in biochemistry, in part because they help hold proteins in their biologically active folded structures. Hydrogen bonds also connect the two intertwining strands of DNA (deoxyribonucleic acid), the substance that contains the genetic code for all organisms. (For more information on DNA, see Chapter 24 "Organic Compounds", Section 24.6 "The Molecules of Life".) Because hydrogen bonds are easier to break than the covalent bonds that form the individual DNA strands, the two intertwined strands can be separated to give intact single strands, which is essential for the duplication of genetic information.

In addition to the importance of hydrogen bonds in biochemical molecules, the extensive hydrogen-bonding network in water is one of the keys to the existence of life on our planet. Based on its molecular mass, water should be a gas at room temperature (20°C), but the strong intermolecular interactions in liquid water greatly increase its boiling point. Hydrogen bonding also produces the relatively open molecular arrangement found in ice, which causes ice to be less dense than water. Because ice floats on the surface of water, it creates an insulating layer that allows aquatic organisms to survive during cold winter months.

These same strong intermolecular hydrogen-bonding interactions are also responsible for the high heat capacity of water and its high heat of fusion. A great deal of energy must be removed from water for it to freeze. Consequently, as noted in Chapter 5 "Energy Changes in Chemical Reactions", large bodies of water act as “thermal buffers” that have a stabilizing effect on the climate of adjacent land areas. Perhaps the most striking example of this effect is the fact that humans can live comfortably at very high latitudes. For example, palm trees grow in southern England at the same latitude (51°N) as the southern end of frigid Hudson Bay and northern Newfoundland in North America, areas known more for their moose populations than for their tropical vegetation. Warm water from the Gulf Stream current in the Atlantic Ocean flows clockwise from the tropical climate at the equator past the eastern coast of the United States and then turns toward England, where heat stored in the water is released. The temperate climate of Europe is largely attributable to the thermal properties of water.

### Note the Pattern

Strong intermolecular hydrogen-bonding interactions are responsible for the high heat capacity of water and its high heat of fusion.

## Historical perspective

For many years, the study of networks focused on the properties of random graphs with normal or Poisson distributions of connections between nodes [16�]. The focus changed in the late 1990s in parallel with the incredibly rapid growth and development of the internet and the world wide web. Subsequently, the world wide web was observed to have: (1) an unexpectedly low average shortest path length between any pair of nodes, and (2) a fat-tailed degree distribution, that is, the number of connections for some nodes is many orders of magnitude higher than the average number [7]. They termed these small world networks, and numerous studies have found them throughout nature [7, 18�]. Not only have they been identified in many different realms in the natural world, but also in man-made systems including communications systems, financial systems, scientific citations, and throughout numerous human social organisations [8]. The reason why small world networks are so frequently observed is believed to be due to their stability -stability here meaning maintaining integrity and minimising the possibility of failure. This stability is believed to arise from optimised communication pathways within these networks, or in other words, from short path lengths between all parts of the network. Since 2002, small world network concepts have been incorporated more and more into the fields of chemistry and structural biology [13]. The thinking behind using the network approach in protein structure analysis is that it allows for contributions from non-local effects to be included into a model.

To clarify, there are two completely different fields of research involving proteins and networks. The subject of this paper is the field of protein conformation. The other field, which has been more widely investigated and reported, involves communication pathways between whole proteins, and commonly referred to as protein networks, protein contact networks, or protein-protein interaction networks (see Csermely et. al. [13] for an overview). That field lies more within biology, specifically systems biology, than within chemistry.

## Properties of metabolic networks: structure versus function

Biological data from high-throughput technologies describing the network components (genes, proteins, metabolites) and their associated interactions have driven the reconstruction and study of structural (topological) properties of large-scale biological networks. In this article, we address the relation of the functional and structural properties by using extensively experimentally validated genome-scale metabolic network models to compute observable functional states of a microorganism and compare the "structure versus function" attributes of metabolic networks. It is observed that, functionally speaking, the essentiality of reactions in a node is not correlated with node connectivity as structural analyses of other biological networks have suggested. These findings are illustrated with the analysis of the genome-scale biochemical networks of three species with distinct modes of metabolism. These results also suggest fundamental differences among different biological networks arising out of their representation and functional constraints.

### Figures

(Left panel) Plot of the average lethality fraction (〈fL,i〉) as a function of…

Plot of the average lethality…

Plot of the average lethality fraction when all reactions corresponding to a metabolite…

## A 'hot-spot' mutation alters the mechanical properties of keratin filament networks

Keratins 5 and 14 polymerize to form the intermediate filament network in the progenitor basal cells of many stratified epithelia including epidermis, where it provides crucial mechanical support. Inherited mutations in K5 or K14 result in epidermolysis bullosa simplex (EBS), a skin-fragility disorder 1 . The impact that such mutations exert on the intrinsic mechanical properties of K5/K14 filaments is unknown. Here we show, by using differential interference contrast microscopy, that a 'hot-spot' mutation in K14 greatly reduces the ability of reconstituted mutant filaments to bundle under crosslinking conditions. Rheological assays measure similar small-deformation mechanical responses for crosslinked solutions of wild-type and mutant keratins. The mutation, however, markedly reduces the resilience of crosslinked networks against large deformations. Single-particle tracking, which probes the local organization of filament networks, shows that the mutant polymer exhibits highly heterogeneous structures compared to those of wild-type filaments. Our results indicate that the fragility of epithelial cells expressing mutant keratin may result from an impaired ability of keratin polymers to be crosslinked into a functional network.

## Discussion

We have presented a corrected and more comprehensive version of the neuronal wiring diagram of hermaphrodite C. elegans using materials from White et al. [7] and new electron micrographs. Despite the significant additions, this wiring diagram is still incomplete due to methodological limitations discussed in the An Updated Wiring Diagram section. Yet, our work represents the most comprehensive mapping of the neuronal wiring diagram to date. The sensitivity of our analysis to methodological limitations (and to network structure variation among individual organisms) is discussed in the Robustness Analysis section.

We proposed a convenient way to visualize the neuronal wiring diagram. The corrected wiring diagram and its visualization should help in planning experiments, such as neuron ablation.

Next, we performed several statistical analyses of the corrected wiring, which should help with inferring function from structure.

By using several different centrality indices, we found central neurons, which may play a special role in information processing. In particular, command interneurons responsible for worm locomotion have high degree centrality in both chemical and gap junction networks. Interestingly, command interneurons are also central according to in-closeness, implying that they are in a good position to integrate signals. However, most command interneurons do not have highest out-closeness, meaning that other out-closeness central neurons, such as DVA, ADEL/R, PVPR, etc., are in a good position to deliver signals to the rest of the network.

Linear systems analysis yielded a principled methodology to hypothesize functional circuits and to predict the outcome of both sensory and artificial stimulation experiments. We have identified several modes that map onto previously identified behaviors.

Networks with similar statistical structural properties may share functional properties thus providing insight into the function of the C. elegans nervous system. To enable comparison of the C. elegans network with other natural and technological networks [76], we computed several structural properties of the neuronal network. In particular, the gap junction network, the chemical synapse network, and the combined neuronal network may all be classified as small world networks because they simultaneously have small average path lengths and large clustering coefficients [14].

The tails of the degree and terminal number distributions for the gap, chemical and combined networks (with the exception of the in-numbers) follow a power law consistent with the network being scale-free in the sense of Barabási and Albert [40]. The tails of some distributions can also be fit by an exponential decay, consistent with a previous report [15]. However, we found that exponential fits for the tails have (sometimes insignificantly) lower log-likelihoods than power laws, making the exponential decay a less likely alternative. For whole distributions, neither distribution passes the -value test if one is forced to choose, the exponential decay may be a less poor alternative.

Several statistical properties of the C. elegans network are similar to those of the mammalian cortex. In particular, the whole distribution of C. elegans chemical synapse multiplicity is well-fit by a stretched exponential (or Weibull) distribution (Figure 6(d)). Taking multiplicity as a proxy of synaptic connection strength, this is reminiscent of the synaptic strength distribution in mammalian cortex, which was measured electrophysiologically, [30], [77]. The definition of stretched exponential distribution is slightly different [30], but has the same tail behavior. The stretch factor is , close to that in the cortical network.

In addition, we found that motif frequencies in the chemical synapse network are similar to those in the mammalian cortex [77]. Both reciprocally connected neuron pairs and triplets with a connection between every pair of neurons (regardless of direction) are over-represented. The similarity of the connection strength and the motif distributions may reflect similar constraints in the two networks. Since proximity is unlikely to be the limiting factor, we suggest that these constraints may reflect functionality. We found that the chemical synapse and the gap junction networks are correlated, which may provide insight into their relative roles.

To conclude the paper, let us note that our scientific development was not hypothesis-driven, but rather exploratory. Yet we hope that the reported statistics will help in formulating a theory that explains how function arises from structure.

## Complexity and emergent properties

Many of the most-critical aspects of how a cell works result from the collective behaviour of many molecular parts, all acting together. Those collective properties—often called “ emergent properties”—are critical attributes of biological systems, as understanding the individual parts alone is insufficient to understand or predict system behaviour. Thus, emergent properties necessarily come from the interactions of the parts of the larger system. As an example, a memory that is stored in the human brain is an emergent property because it cannot be understood as a property of a single neuron or even many neurons considered one at a time. Rather, it is a collective property of a large number of neurons acting together.

One of the most-important aspects of the individual molecular parts and the complex things they constitute is the information that the parts contain and transmit. In biology information in molecular structures—the chemical properties of molecules that enable them to recognize and bind to one another—is central to the function of all processes. Such information provides a framework for understanding biological systems, the significance of which was captured insightfully by American theoretical physical chemist Linus Pauling and French biologist Emil Zuckerkandl, who stated in a joint paper, “Life is a relationship among molecules and not a property of any one molecule.” In other words, life is defined in terms of interactions, relationships, and collective properties of many molecular systems and their parts.

The central argument concerning information in biology can be seen by considering the heredity of information, or the passing on of information from one generation to the next. For a given species, the information in its genome must persist through reproduction in order to guarantee the species’ survival. DNA is passed on faithfully, enabling a species’ genetic information to endure and, over time, to be acted on by evolutionary forces. The information that exists in living things today has accumulated and has been shaped over the course of more than 3.4 billion years. As a result, focusing on the molecular information in biological systems provides a useful vantage point for understanding how living systems work.

That the emergent properties derived from the collective function of many parts are the key properties of biological systems has been known since at least the first half of the 20th century. They have been considered extensively in cell biology, physiology, developmental biology, and ecology. In ecology, for example, debate regarding the importance of complexity in ecological systems and the relationship between complexity and ecological stability began in the 1950s. Since then, scientists have realized that complexity is a general property of biology, and technologies and methods to understand parts and their interactive behaviours at the molecular level have been developed. Quantitative change in biology, based on biological data and experimental methods, has precipitated profound qualitative change in how biological systems are viewed, analyzed, and understood. The repercussions of that change have been immense, resulting in shifts in how research is carried out and in how biology is understood.

A comparison with systems engineering can provide useful insight into the nature of systems biology. When engineers design systems, they explore known components that can be put together in such a way as to create a system that behaves in a prescribed fashion, according to the design specifications. When biologists look at a system, on the other hand, their initial tasks are to identify the components and to understand the properties of individual components. They then attempt to identify how interactions between the components ultimately create the system’s observable biological behaviours. The process is more closely aligned with the notion of “systems reverse engineering” than it is with systems design engineering.

The Human Genome Project contributed broadly to that revolution in biology in at least three different ways: (1) by acquiring the genetics “parts list” of all genes in the human genome (2) by catalyzing the development of high-throughput technology platforms for generating large data sets for DNA, RNA, and proteins and (3) by inspiring and contributing to the development of the computational and mathematical tools needed for analyzing and understanding large data sets. The project, it could be argued, was the final catalyst that brought about the shift to the systems point of view in biology.

## Author summary

Gene regulatory networks are essential for cell fate specification and function. But the recursive links that comprise these networks often make determining their properties and behaviour complicated. Computational models of these networks can also be difficult to decipher. To reduce the complexity of such models we employ a Zwanzig-Mori projection approach. This allows a system of ordinary differential equations, representing a network, to be reduced to an arbitrary subnetwork consisting of part of the initial network, with the rest of the network (bulk) captured by memory functions. These memory functions account for the bulk by describing signals that return to the subnetwork after some time, having passed through the bulk. We show how this approach can be used to simplify analysis and to probe the behaviour of a gene regulatory network. Applying the method to a transcriptional network in the vertebrate neural tube reveals previously unappreciated properties of the network. By taking advantage of the structure of the memory functions we identify interactions within the network that are unnecessary for sustaining correct patterning. Upon further investigation we find that these interactions are important for conferring robustness to variation in initial conditions. Taken together we demonstrate the validity and applicability of the Zwanzig-Mori projection approach to gene regulatory networks.

Citation: Herrera-Delgado E, Perez-Carrasco R, Briscoe J, Sollich P (2018) Memory functions reveal structural properties of gene regulatory networks. PLoS Comput Biol 14(2): e1006003. https://doi.org/10.1371/journal.pcbi.1006003

Received: August 30, 2017 Accepted: January 24, 2018 Published: February 22, 2018

Copyright: © 2018 Herrera-Delgado et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: EHD and PS acknowledge the stimulating research environment provided by the EPSRC Centre for Doctoral Training in Cross-Disciplinary Approaches to Non-Equilibrium Systems (CANES, EP/L015854/1) (https://www.epsrc.ac.uk/skills/students/centres/profiles/crossdisciplinaryapproachestononequilibriumsystems/). RPC was supported by the Wellcome Trust (WT098325MA) EHD and JB are supported by the Francis Crick Institute which receives its core funding from Cancer Research UK (FC001051) (https://www.cancerresearchuk.org/), UK Medical Research Council (FC001051) (https://www.mrc.ac.uk/), Wellcome Trust (FC001051 and WT098326MA) (https://wellcome.ac.uk/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

DR designed and developed the PathwayOracle application, participated in evaluating features for inclusion, and drafted the manuscript. LN participated in application design and feature selection. PTR contributed biological case studies and data for PathwayOracle feature design. All authors read and approved the final manuscript.

DR and LN are supported in part by a Seed Grant awarded to LN from the Gulf Coast Center for Computational Cancer Research, funded by John and Ann Doerr Fund for Computational Biomedicine. PTR is supported in part by a Department of Defense grant BC044268.