Information

Gene networks for different tissues?

Gene networks for different tissues?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I currently work on gene networks and specifically, I analyze the brain gene regulatory network of an insect to understand its sexual behaviour. During a recent presentation, I got asked this question which I just couldn't really tackle in a good manner. The question was why do I study the brain gene network to understand sexual activity when I should be studying the gene network of a sexual organ, for eg: the ovary. I got out saying that since the brain decides behaviour, it would be much better to study the brain GRN. However, I got confused and would like to know if different tissue or organs in the same organism have different gene networks? Any idea regarding the theory is welcomed.


Same Signal, Different Tissues: Morphogen Interpretation

Proper tissue function relies on the collaboration and interplay between distinct cell types, each with their specific functions, organized in specific spatial arrangements. How developmental cues specify these distinct cell types in embryos and coordinate their patterning into functional tissues is one of the fundamental questions of developmental biology. In most instances, the process is controlled by secreted molecules, often termed morphogens, that spread through the developing tissue to form gradients and induce target gene expression at characteristic positions along the gradient. Surprisingly, morphogens are not specific for particular tissues. Instead a relatively small set of signals are repeatedly used in multiple different developmental contexts. This raises the question of how a graded signal controls differential target gene responses within a tissue and how the same signal can be interpreted by different tissues to produce different cell types.

The tissue specificity of morphogen signalling was first recognized in classical embryological experiments in which the response to transplants of morphogen producing tissue was shown to depend on the receiving cells. This property has been dubbed competence, and depends on the epigenetic state of the receiving cells, which determines the accessibility of regulatory elements of target genes, and the transcription factors present in the receiving cells that act as cofactors or contributory transcriptional effectors. Since both transcription factor expression and the regulatory element accessibility are controlled by prior developmental events, this mechanism links the morphogen response to the history of a cell.

Tissue patterning during embryonic development relies on the differential induction of target genes by morphogen gradients. Induction of target genes depends not only on the level of the morphogen, but also the specific competence of receiving cells, the ability of cells to decode dynamics of morphogen signaling, and the regulatory logic of downstream transcriptional networks.

Within a tissue, morphogen concentration has been considered the main determinant of differential target gene responses. In many cases, however, this is an oversimplification and the dynamics of signalling – the duration and temporal behaviour – are also crucial. Both positive and negative feedback within the signalling pathway, produced by the signal induced expression of activators or inhibitors of the pathway, has been observed for several morphogens. Such feedback can allow cells to measure dynamic properties of the morphogen signal, such as duration or rate of change. Furthermore, the target genes controlled by many morphogens include transcription factors. These can form transcriptional networks comprising positive and negative regulatory interactions between the factors. The logic of these networks provides a means for cells to integrate the level and dynamics of signaling for differential target gene induction. These networks can convert a continuous gradient into discrete switches in gene expression, increase the precision with which target genes are induced by a noisy signal and relay the information provided by a morphogen gradient at early stages of development, when the tissue is small, to later stages, when the tissue is too large to be reliably patterned by a gradient.

In an advanced review article entitled “Morphogen interpretation: concentration, time, competence, and signaling dynamics” recently published in WIREs Developmental Biology, James Briscoe discusses the molecular mechanisms that underlie the ability of cells to diversify their response to morphogens by interpreting them in a context-dependent and dynamical manner.


Introduction

The emerging paradigm of "network medicine" has been proposed to utilize different network-based approaches to predict essential proteins [1–4], identify protein complexes [5–8] and detect candidate genes related to different diseases [9].As methodologies progress, network medicine has the potential to capture the molecular complexity of human disease while offering computational methods to discern how such complexity controls disease manifestations, prognosis, and therapy. Up to now, different types of biological data have been used to study disease related genes and complexes [10–12]. For example, Goh K., et al., [13] constructed a network that consisted of genes associated with the same disease, while Tian W., et al., [14] combined protein and genetic interactions with gene expression correlation. Ulitsky I and Shamir R [15] also combined interactions from published networks and yeast two-hybrid experiments to identify the associations. Analyses of recent research studies, according to CIPHER [16], GeneWalker [17], PRINCE [18] and RWRH [19] highlighted the associations that were derived directly from protein interactions to more distant connections in various ways. Even though genes causing similar diseases lay close to one another in the network, these algorithms did not take into account the fact that the majority of genetic disorders tend to manifest only in a single or a few tissues [13, 20]. Tissue specificity is an important aspect of many genetic diseases, reflecting the potentially different roles of proteins and pathways in diverse cell lineages. In the context of genetic disorders, even though the underlying harmful mutation can exist in all the cells in the human body, it most often wreaks havoc only in a few tissues. This tissue selectivity will appear due to the differences in the functionality of the mutated protein within these tissues, its tissue-specific interacting proteins, its abundance and the abundance of its inter-actors. Hence, the purpose of this study is to investigate whether a tissue specific network was a better representation for the actual disease-related tissue, which yields to more accurate prioritizations of the disease-gene associations.

Some research has been carried out by constructing tissue specific networks to detect diseases through the Bayesian structure learning algorithms [21]. But Bayesian structure learning algorithms had three major shortcomings, that is, the high computational cost, inefficiency in exploring qualitative knowledge, and the inability to reconstruct phenotype specific gene network. Others [22] analyzed human PPIs in a tissue-specific context, showing that many housekeeping proteins interact with highly tissue-specific proteins, which in turn implies that housekeeping proteins may have tissue-specific roles. This analysis was taken a step further by Emig and Albrecht [23] who identified the functional differences between tissues, showing that tissue-specific protein interactions are often involved in transmembrane transport and receptor activation.

This study therefore seeks to construct tissue-specific gene-gene networks for a particular query disease and try to match these networks with the similar phenotype details to predict new disease-gene associations. The novel tissue-specific gene-gene network construction method called the tissue-specified genes (TSG) method would be used to initially identify the tissues mainly affecting the query disease and secondly the gene expression details of the tissues would be used to construct tissue-specific gene-gene networks. Created tissue-specific networks would be used with the most nearest phenotype details of the query disease to predict gene-disease associations. The original Katz method has been modified and used as the primary method of prioritizing disease genes by using tissue-specific gene-gene networks. The novel tissue-specific gene-gene network construction method is described in details in the methodology section.


Introduction

Tissue-specificity, in which cells perform different functions despite possessing identical DNA, is achieved partially through tissue-dependent mechanisms of gene regulation, including epigenetic modification and transcriptional and post-transcriptional regulation [1–3]. These complex programs of control produce different gene expression programs across tissues, with most genes showing statistically significant differential expression [4, 5]. These differences can have significant consequences: tissue-specific genes are especially likely to be drug targets [6] and tissue-specific transcription factors are especially likely to be implicated in complex diseases [2, 7, 8]. Understanding these differences is also essential for understanding pleiotropic genes, and for interpreting studies in which genomics data can only be collected for an accessible or a proxy tissue (such as use of blood in studying psychiatric disorders [9–11]).

Tissue-specific mechanisms of control may be captured by co-expression networks, in which two genes are connected if their expression levels are correlated across a set of individuals. In such a setting, genetic or environmental differences across individuals serve as small perturbations to the underlying regulatory network, resulting in correlation between genes’ expression levels that are consistent with regulatory relationships. Co-expression networks provide insight into cellular activity as genes that are co-expressed often share common functions [12], and such networks have been widely used to study disease [13–15].

The Genotype-Tissue Expression (GTEx) consortium dataset [16] provides an opportunity to study such co-expression networks for an unprecedented number of human tissues simultaneously. However, many of the profiled tissues have fewer than a dozen samples, too few to accurately infer the tens of millions of parameters that would define a co-expression or regulatory network. One solution would be to combine all available samples and learn a single consensus network for all tissues, but this would offer no insight into tissue-specificity. On the other hand, inferring each network independently ignores tissue commonalities: tissue networks share far more links than would be expected by chance, and learning links across multiple tissues is less noisy than learning links using a single tissue, even when using the same number of total samples [12].

Here, we use a novel algorithm, GNAT (Gene Network Analysis Tool), to simultaneously construct co-expression networks for 35 distinct human tissues. Using a hierarchy which encodes tissue similarity, our approach learns a network for each tissue, encouraging tissues that are nearby in the hierarchy to have similar networks. Hierarchical transfer learning has been shown to improve power and accuracy in previous work [5, 6, 17, 18]. We propose a novel hierarchical model along with a parameter optimization method designed for large-scale data, and apply it to the GTEx data. We show that our method infers networks with higher cross-validated likelihood than networks learned on each tissue independently or a single network learned on all tissues. Our method is applicable to any dataset in which sample relationships can be described by a hierarchy—for example, multiple cancer cell lines or species in a phylogenetic tree. The complete code for our method is available as S1 Data.

We analyze the resulting networks to make several novel observations regarding principles of tissue-specificity. We propose multiple metrics for identifying genes that are important in defining tissue identity, and demonstrate that such genes are disproportionately essential genes. We show that tissue-specific transcription factors, which are central hubs in our networks, link to genes with tissue-specific functions, which in turn display higher expression levels. We identify 1,789 gene modules that are enriched for Gene Ontology functions, and show that enriched modules that are upregulated within a tissue are often instrumental to tissue function. We also show that modules which occur across tissues are especially likely to be enriched for Gene Ontology functions, and that these functions tend to be those which are essential to all tissues. The results presented here, including all the networks and gene modules, can be interactively queried through our web tool [19] the genes and modules identified provide a basis for future investigation.


Results

Identification of a Common Set of Circadian Genes in Mouse

We searched for circadian oscillating genes in 21 circadian time series microarray data covering 14 tissues in mouse (Table S1) by fitting them to cosine functions with different phases, and extracted circadian phase information for circadian oscillating genes. We identified 9,995 known genes showing circadian oscillations in at least one tissue (Table S2). The number of genes showing circadian oscillation in multiple tissues decreases rapidly as the number of tissues increases, whereas the consistency of their circadian phases across tissues as measured in p-values of circular range tests improves rapidly (Figure 1). We identified 41 common circadian genes, defined as the genes showing circadian oscillation in at least 8 out of 14 tissues in mouse (Table 1). 13 out of 19 previously known key circadian genes were among the common circadian genes that we identified in this study. Other known key circadian genes: Rorb, Cry2, Rora, Npas2, and Hlf were found to be circadian oscillating in one, three, three, four, and five tissues, respectively. Bhlhb3 was not found to be circadian oscillating in any tissue. 39 of these common circadian genes showed significant consistency (p<1/3 in circular range test) of their circadian phases across all tissues.

(A) Distribution of the number of circadian oscillating genes identified in different numbers of mouse tissues. (B) Distribution of p-values in circular range tests for circadian phases of circadian oscillating genes identified in different numbers of mouse tissues.

Comparison between Tissues

We surveyed tissue-specific gene expression profiles in a mouse tissue gene expression atlas [7] for the circadian oscillating genes in different tissues. To cross-validate the circadian phase data with the tissue gene expression data, we created a binary matrix of 1 or 0 to denote the presence or absence of circadian oscillations in 14 tissues in circadian phase data and compared it to the gene expression matrix in 61 tissues from the tissue gene expression atlas. For each pair of tissues from the two matrices, we calculated a correlation coefficient. The circadian data in liver, kidney, skeletal muscle, adrenal gland, and white adipose tissue correctly correlated best with their corresponding tissues in the tissue gene expression atlas, whereas SCN correlated equally well with preoptic and hypothalamus, and brown adipose tissue correlated equally well with adipose tissue and brown fat. These results reflected the fact that sufficiently high gene expression levels are the prerequisite to be detected as circadian oscillating in our collection of microarray datasets.

To investigate if the differences in the circadian phases of circadian oscillating genes across tissues are caused by the differences in their gene expression levels, we calculated the variances of circadian phases and the variances of gene expression for circadian oscillating genes across the seven tissues common to our circadian datasets and the tissue gene expression atlas. There is no significant correlation (r = 0.01, p = 0.71) between these two variances. For example, the gene expression level of Per2 is 27 times higher in adrenal gland than in skeletal muscle, but this has no effect on the consistency of circadian phases of Per2 between the two tissues. In fact, the common circadian genes have significantly higher variances of gene expression across the 61 tissues than those from the same number of randomly selected genes. We observed that the correlation coefficients rij between the tissue gene expression data of the common circadian gene pairs (i,j) negatively correlated with their circadian phase differences (r = −0.22, p<10 −8 ). The gene pairs positively correlated in their tissue gene expression patterns had a significantly lower circadian phase difference than expected by random, whereas the gene pairs negatively correlated in their tissue gene expression patterns had a significantly larger circadian phase difference than expected by random (Figure S1). Therefore, the common circadian genes with similar gene expression patterns across tissues also tend to have similar circadian phases. The circadian gene regulation may share a similar mechanism that gives rise to tissue-specific gene expression.

We clustered the 21 circadian phase datasets using hierarchical clustering. The datasets from the same tissue or biologically closely related tissues were clustered together, suggesting that the differences in circadian phases between tissues resulted from their biological differences (Figure 2). To ensure that these differences between tissues were also reproducible between experiments, we used circular ANOVA to identify the circadian oscillating genes shared between two tissues but associated with significantly different circadian phases between these tissues. There were 12 circadian oscillating genes shared between two SCN datasets and at least two liver datasets. Among them, Per1, Per2, Nr1d2, and Avpr1a showed a significant (p<0.01) advance of about 6 hours in their circadian phases in SCN datasets compared to liver datasets, whereas Dnajb1, Hmgb3, Hsp110, and Pdcd4 showed no significant differences in their circadian phases between SCN and liver (Figure 3). To test if such differences also exist between SCN and whole brain tissues, we also compared SCN with 3 whole brain datasets. There were 12 circadian oscillating genes shared between two SCN datasets and at least two whole brain datasets. Per2, Nr1d2, and Tuba8 again showed a significant advance of about 6 hours in their circadian phases in SCN datasets compared to whole brain datasets, whereas Hmgb3, Hsp110, Sgk, and Fabp7 showed no significant differences in their circadian phases between SCN and whole brain. Further examination validated that the known key circadian genes including Per1, Per2, Cry1, Arntl, Nr1d1, and Nr1d2 all showed around 6 hour advances in circadian phases between SCN and non-SCN tissues in general, whereas heat shock proteins showed consistent circadian phases across all tissues. There were 15 circadian oscillating genes shared between 3 heart datasets including whole heart, atria, and ventricle and at least 3 liver datasets. Comparing the heart datasets with the liver datasets, Bhlhb2 (p<0.001) and Tspan4 (p = 0.006) had circadian phase 5–6 hours earlier in heart than liver whereas Dscr1 (p = 0.002) had circadian phase 8 hours later in heart than liver. Other known key circadian genes such as Per1/Per2, Arntl, and Nr1d1/Nr1d2 showed consistent circadian phases between heart and liver. Comparing the whole brain datasets with the liver datasets, Tfrc, St3gal5, and Tspan4 had circadian phases more than 4 hours earlier in whole brain than liver, whereas Hist1h1c, Tsc22d1, Myo1b, Litaf, and BC004004 had circadian phases more than 4 hours later in whole brain than liver.

Datasets are denoted by first author names and tissue types.

p-values from the circular ANOVA test are indicated in the parenthesis. The solid line represents y = x. The dashed lines represent y = x±6 respectively.

Comparison between Mammalian Species

Among the 1,269 rat genes identified as circadian oscillating genes in rat liver, 1,137 of them had homologues in mouse. 232 of them overlapped with 944 mouse liver circadian oscillating genes in at least 2 mouse liver datasets. We used the circular ANOVA test to identify the circadian oscillating genes shared in both mouse and rat livers but with significantly different circadian phases. 10 genes had significantly (p<0.01) different circadian phases between mouse and rat livers. The circadian phases of BC006779, Cdkn1a, Svil, Uox, Ak2, Nr1d1, Mtss1, Nudt16l1, and Gss were 4–6 hours later in rat liver than mouse liver, whereas Hsd17b2 was in anti-phase between mouse and rat livers (Figure S2).

Among 803 rat skeletal muscle (SKM) circadian oscillating genes, 703 of them had homologues in mouse and 64 of them overlapped with 440 mouse SKM circadian oscillating genes. Among the overlapping genes, 34 of them did not show circadian phase differences larger than 4 hours between mouse and rat SKM. 22 of them had circadian phases more than 4 hours later in rat SKM than mouse SKM. Cpt1a, Pdk4, and Ucp3, involved in lipid metabolism, showed a 5–8 hour delay in their circadian phases in rat SKM compared to mouse SKM. 8 genes had circadian phases more than 4 hours earlier in rat SKM than in mouse SKM. Among them, Fkbp5 and Sgk, which are controlled by the glucocorticoid receptor element (GRE), had about 6 hour advance in their circadian phases in rat SKM compared to mouse SKM. There were 11 circadian oscillating genes common to mouse liver and SKM, and rat liver and SKM. The 4–5 hour delay in circadian phases in rat compared to mouse was observed in both liver and SKM for all 11 circadian genes except Dynll1.

Among 603 rhesus macaque adrenal gland circadian oscillating genes, 560 had homologues in mouse and 170 overlapped with 4,162 mouse adrenal gland circadian oscillating genes. We found significant differences in circadian phases also between these two species. Among the overlapping genes, 47 did not show circadian phase differences larger than 4 hours between mouse and macaque, whereas 66 had circadian phases more than 4 hours later in the macaque adrenal than in the mouse adrenal. Known key circadian genes, Arntl, Dbp, Nr1d1, and Bhlhb2, showed about 8 hour delay in their circadian phases in the macaque adrenal compared to the mouse adrenal. Although Per2 did not satisfy our criteria (p<0.01) to be a circadian oscillating gene in macaque adrenal, this gene has a circadian phase at CT21 (p = 0.03), which is also about 8 hours later than that in mouse. Similarly, heat shock proteins, Hsp110, Hspa8, Dnaja1, and Dnajb6, had circadian phases around CT16 in the mouse adrenal but around CT0 in the macaque adrenal. Cold inducible protein (Cirbp) had a circadian phase around CT7 in the mouse adrenal but around CT16 in the macaque adrenal, in anti-phase with heat shock proteins in both mouse and macaque. On the other hand, there were also 57 genes showing circadian phases more than 4 hours early in the macaque adrenal than in the mouse adrenal.

In the human circadian SKM microarray study, there were only two circadian time point measurements: CT1 and CT13. Hence we can only roughly estimate the circadian phases to be either CT1 or CT13 in human SKM. Among the common circadian genes, Per1, Per2, Nr1d2, and Dbp had circadian phases around CT1, whereas Arntl and Cry1 had circadian phases around CT13 in human SKM. Our estimates of circadian phases for Per1 and Per2 in human SKM were in good agreement with the study in human peripheral blood mononuclear cells where a 2 hour sampling time was used throughout 72 hours [8]. The heat shock proteins, Dnaja1, Dnajb4, and Hspa4, had circadian phases around CT13, consistent with the peak of common body temperature at CT10 in human [8].

Next, we made a three-species comparison of circadian phases in the SKMs of mouse, rat, and human. We found 12 circadian oscillating genes common to SKM in all three species (Table 2). After we rounded the circadian phases in mouse and rat to their closest time points, CT1 or CT13, we observed that Per2, Arntl, Dbp, Ppp1r3c, and Ablim1 had conserved circadian phases between mouse and rat, but were 12 hours away from those of human. Epm2aip1, G0S2, and Maf had conserved circadian phases between mouse and human but 12 hours away from those of rat. Finally, D19Wsu162e, Myod1, Pfn2, and Ucp3 had conserved circadian phases among all three species.

Biological Functions of the Circadian Rhythm

We searched for the Gene Ontology (GO) categories significantly over-represented in circadian oscillating genes in each mouse tissue using GOminer program [9]. We further tested the associations of GO categories with any specific circadian phase intervals using Fisher's test with a rotating window method. The list of significant biological processes associated with circadian phases in different tissues is shown in Table S3. The most common of these biological processes were steroid biosynthesis, heat shock response, and protein folding. Steroid biosynthesis was associated with CT22 in liver, kidney, adrenal, brown adipose tissue (BAT), and white adipose tissue (WAT). Heat shock response or protein folding were associated with CT16 in SCN, liver, kidney, adrenal, aorta, BAT, WAT, calvarial bone, and whole brain, due to a large number of heat shock proteins consistently showing circadian phases near CT16 in most tissues. In liver, carbohydrate and amino acid metabolism were associated with CT17 and CT15 respectively, consistent with the rise of activities after light off in mouse. In BAT, WAT, and adrenal, lipid metabolism was associated with CT22. Negative regulation of protein kinase activities was associated with CT17 in prefrontal cortex and CT21 in whole brain. There were also notable differences in the circadian phases of some biological processes between tissues. For example, protein translation was associated with CT20 in SCN but CT9 in WAT. Organ development was associated with CT22 in heart and BAT but CT10 in adrenal.

Promoter Analysis

To test the association of transcription factor (TF) regulation with the circadian oscillation of gene expression, we predicted the TF binding sites on the mouse promoters of circadian oscillating genes in each tissue using positional weight matrix (PWM) based methods. We first tested whether there was a significant over-representation of TF PWM binding sites on the promoters of circadian oscillating genes using the Fisher's exact test. Among the significant TF PWMs, we again tested their associations with any specific phase intervals using the Fisher's test with a rotating window method. To remove the redundancy in TF PWMs, we grouped the TF PWMs into TF families and averaged the associated circadian phases of significant TF PWMs within the same TF families. The results are shown in Table S4. EBOX, AP-2, CRE, SP1, and EGR were the top 5 TF families associated the circadian phase in most tissues. However, unlike the consistent circadian phases of the common circadian genes across tissues, the associated circadian phases of the significant TF families varied considerably among different tissues. EBOX was associated with CT12 in the majority of tissues including SCN, liver, aorta, adrenal, WAT, brain, atria, ventricle, and prefrontal cortex, but it was associated with CT0 in skeletal muscle, BAT, and calvarial bone. CRE was consistently associated with CT11 in SCN, liver, aorta, heart, adrenal, calvarial bone, prefrontal cortex, and ventricle, but with CT20 in atria. Two other known TF families related to circadian rhythm, RRE and DBOX, were detected to be associated with circadian phase only in two tissues. RRE was associated with CT0 in liver and WAT. DBOX was associated with CT16 in aorta and adrenal.

Identification of Gene Regulatory Interactions

We obtained microarray data from TF knockout or mutants for Clock, Arntl, Npas2, Nr1d1, Rora/Rorc, Egr1/Egr3, Dbp/Hlf/Tef, and Ppara in various mouse tissues, together with Cebpa/Cebpb/Cebpd/Cebpe transfection microarray data in NIH3T3 cells. To study the systematic effects of glucocorticoids, cAMP, and temperature on the circadian rhythm, we included microarray data from Nr3c1 (glucocorticoid receptor), Pka, and Hsf1 knockouts or mutants in response to DEX (glucocorticoid agonist), cAMP, and heat stimulation, respectively, compared with wild type mouse. We also included microarray data from a light response mouse model in order to identify light sensitive genes in mouse SCN [10]. The complete list of knockout or mutant microarray experiments used in this study is shown in Table S5. We assumed that the target genes of TFs will be significantly down-regulated in the knockout or mutant compared with the wild type mouse in the case of activators, and up-regulated in the case of repressors, such as Nr1d1. To identify the direct targets of TFs in knockout or mutant experiments, we required that the significantly affected genes in the knockout or mutant must have at least one putative binding site of their corresponding TFs in the promoter regions. Under these criteria, we identified 320 EBOX, 295 RRE, 43 DBOX, 492 EGRE, 455 CRE, 326 GRE, 122 HSE, 607 CEBP, and 516 PPRE controlled genes respectively (Table S6). For these genes, we extracted their mean circadian phases if they have consistent circadian phases across multiple tissues (p<1/3, circular range test). We observed that EBOX was significantly associated with CT12 (p<10 −6 , Fisher's exact test), RRE with CT1 (p<10 −6 ), DBOX with CT15 (p<10 −5 ), HSE with CT17 (p<10 −6 ) (Figure S3).

Circadian Gene Regulatory Network

Based on these regulatory interactions, we constructed the gene regulatory network for the circadian oscillating genes in mouse. In Figure 4, we show a network consisting of the circadian oscillating genes identified in at least 7 mouse tissues. Among the 81 circadian oscillating genes identified in at least 7 tissues, 53 of them can be included through 88 regulatory interactions with 9 cis-regulatory elements in our network. Their circadian phases were represented by different colors in the color wheel. We were able to identify almost all known transcription regulatory interactions for common circadian genes in the literature, except EBOX → Per1, EBOX → Nr1d1, EBOX → Ppara, RRE → Nr1d1, and RRE → Cry1. To further complete our network, we supplemented these missing gene regulatory interactions with known protein interaction information (Per/Cry Arntl/Clock and Fkbp:Hsp90 Nr3c1) and protein phosphorylation information (Csnk1d → Per/Cry and Gsk3b → Nr1d1) from the literature. These relationships are shown in red color in Figure 4.

(A) Gene regulatory network consisting of the circadian oscillating genes identified in at least 7 mouse tissues. (B) The subset of network highlighting NR3C1 and FKBP/HSP90's role of integrating the regulatory inputs from diverse environmental signals into circadian genes. Blue arrows represent the gene regulatory interactions obtained in this study. Red arrows represent the known gene regulatory or protein interactions extracted from the literature. P stands for phosphorylation. White boxes represent cis-regulatory elements. Colored circles represent the genes with circadian phase information, where circadian phases are represented by the different colors in the color wheel. White circles represent protein complexes or genes without circadian phase information.

Two well-known negative feedback loops can be reconstructed from this analysis: Arntl/Clock → EBOX → Per1/Per2 Arntl/Clock and Nr1d1/Nr1d2 RRE → Arntl/Clock → EBOX → Nr1d1/Nr1d2. Two feedforward loops are attached to the negative feedback loops through Arntl/Clock → EBOX → Dbp → DBOX → Per1/Per2 acting as an alternative route of Arntl/Clock → EBOX → Per1/Per2 and Nr1d1/Nr1d2 RRE → Nfil3 DBOX → Per1/Per2 Arntl/Clock acting as an alternative route of Nr1d1/Nr1d2 RRE → Arntl/Clock. Bhlhb2 inhibiting EBOX is also regulated by EBOX and Nr1d1 inhibiting RRE is also regulated by RRE, therefore forming two auto-regulatory loops.

The effects of food and light act on common circadian genes directly through GRE and CRE respectively. GRE controls Per1 and Per2, while CRE controls Per1, Rora, Nr1d2, and Nfil3. As shown in Figure 4B, the effect of temperature acts on common circadian genes rather indirectly through the route HSE → Hsp90aa1 → Fkbp/Hsp90 Nr3c1 → GRE → Per1/Per2. Nr3c1 and the Fkbp/Hsp90 complex are also components of another negative feedback loop, Nr3c1 → GRE → Fkbp5 → Fkbp/Hsp90 Nr3c1, which may play an important role in glucocorticoid stimulation. Nr3c1 is also under the control of CRE and therefore may be responsive to light stimulation. Nr3c1 and the Fkbp/Hsp90 complex feed into EBOX by regulating Per1/Per2 through GRE. In turn, EBOX controls both components of the Fkbp/Hsp90 complex, i.e., Fkbp5 directly and Hsp90aa1 indirectly through EBOX → Ppara → PPRE → Hsp90aa1. Therefore, Nr3c1 and Fkbp/Hsp90 play central role of integrating the regulatory inputs from diverse environmental signals into circadian genes in our network (Figure 4B).


Acknowledgements

We thank Colleen Russell, Ph.D. for her careful reading of this manuscript and suggestions.

The authors are supported by NIH (R24DK087669, P30DK46200, P30DK072476, DK082574 and 1RC2ES01871), the Society for Women’s Health Research ISIS Network on Metabolism, United States Department of Agriculture's Agricultural Research Service (58-1950-7-707), and the Evans Center for Interdisciplinary Biomedical Research, Department of Medicine, Boston University School of Medicine.


GRNs in Metazoan Development and Evolution

Here, we have focused on examples of the ways in which GRN subcircuits mandate developmental logic. The design features we have considered are devices used to drive the development of all animal embryos, and as the parallelism illustrated in Fig. 1 shows, disparate organisms, in different tissues, using different genes, nonetheless execute similar developmental decisions with the same circuit designs. We believe that in the near future a repertoire of such GRN subcircuits will be revealed, a repertoire that has been assembled in countless combinations throughout the evolution of diverse body plans among the Metazoa.

This PNAS Special Feature contains 10 articles covering a variety of contemporary topics relevant to the role of gene regulatory networks in animal development and evolution. The first 2 articles, by Hobert (19) and Hong et al. (20), respectively, provide Perspectives on two long-standing problems in metazoan development. Hobert discusses recent advances in our understanding of the gene regulatory networks responsible for the specification of individual neuronal cell types in C. elegans (19). Hong et al. (20) summarize the use of postgenome technologies in determining how different concentrations of the Dorsal transcription factor produce a variety of gene expression patterns in the early Drosophila embryo.

The next 4 articles are original research papers that present new insights into our understanding of how gene regulatory networks control different aspects of embryonic and postembryonic development, as well as changes in body patterning during animal evolution. The articles from Tumpel et al. (21), Nikitina et al. (22), Ochoa-Espinosa et al. (23), and Smith and Davidson ( , 18) describe advances in basic embryonic patterning processes, including the specification of the posterior hindbrain in vertebrates, the specification of neural crest progenitors in lampreys, the combinatorial control of A/P patterning of the Drosophila embryo, and the specification of the endomesoderm territory in the sea urchin embryo.

Two more articles are devoted to one of the major challenges in developmental biology, namely, unraveling the complex regulatory networks underlying the formation of postembryonic tissues and organs. The article by Ririe et al. (24) examines vulva development in C. elegans, with an emphasis on how gene networks coordinate individual cells to produce a complex organ. Georgescu et al. (25) examine the fascinating problem of T cell specification and diversification in the mammalian immune system. Evidence is presented for dynamic networks that are generally more plastic and reversible than those seen in hard-wired developmental processes such as endomesoderm specification in the sea urchin.

The final 2 research articles address problems in the evolutionary diversity of animal morphology. Gross et al. (26) explore the genome organization of the Mexican cavefish, Astyanax mexicanus, in an effort to understand the basis for its peculiar mode of adaptation, including the loss of eyes. Finally, Usui et al. (27) investigate the large sensory bristles (macrochaetae) of the adult fruitfly as a paradigm for understanding the evolution of morphological diversity.


Abstract

The co-occurrence of diseases can inform the underlying network biology of shared and multifunctional genes and pathways. In addition, comorbidities help to elucidate the effects of external exposures, such as diet, lifestyle and patient care. With worldwide health transaction data now often being collected electronically, disease co-occurrences are starting to be quantitatively characterized. Linking network dynamics to the real-life, non-ideal patient in whom diseases co-occur and interact provides a valuable basis for generating hypotheses on molecular disease mechanisms, and provides knowledge that can facilitate drug repurposing and the development of targeted therapeutic strategies.


Contents

At one level, biological cells can be thought of as "partially mixed bags" of biological chemicals – in the discussion of gene regulatory networks, these chemicals are mostly the messenger RNAs (mRNAs) and proteins that arise from gene expression. These mRNA and proteins interact with each other with various degrees of specificity. Some diffuse around the cell. Others are bound to cell membranes, interacting with molecules in the environment. Still others pass through cell membranes and mediate long range signals to other cells in a multi-cellular organism. These molecules and their interactions comprise a gene regulatory network. A typical gene regulatory network looks something like this:

The nodes of this network can represent genes, proteins, mRNAs, protein/protein complexes or cellular processes. Nodes that are depicted as lying along vertical lines are associated with the cell/environment interfaces, while the others are free-floating and can diffuse. Edges between nodes represent interactions between the nodes, that can correspond to individual molecular reactions between DNA, mRNA, miRNA, proteins or molecular processes through which the products of one gene affect those of another, though the lack of experimentally obtained information often implies that some reactions are not modeled at such a fine level of detail. These interactions can be inductive (usually represented by arrowheads or the + sign), with an increase in the concentration of one leading to an increase in the other, inhibitory (represented with filled circles, blunt arrows or the minus sign), with an increase in one leading to a decrease in the other, or dual, when depending on the circumstances the regulator can activate or inhibit the target node. The nodes can regulate themselves directly or indirectly, creating feedback loops, which form cyclic chains of dependencies in the topological network. The network structure is an abstraction of the system's molecular or chemical dynamics, describing the manifold ways in which one substance affects all the others to which it is connected. In practice, such GRNs are inferred from the biological literature on a given system and represent a distillation of the collective knowledge about a set of related biochemical reactions. To speed up the manual curation of GRNs, some recent efforts try to use text mining, curated databases, network inference from massive data, model checking and other information extraction technologies for this purpose. [4]

Genes can be viewed as nodes in the network, with input being proteins such as transcription factors, and outputs being the level of gene expression. The value of the node depends on a function which depends on the value of its regulators in previous time steps (in the Boolean network described below these are Boolean functions, typically AND, OR, and NOT). These functions have been interpreted as performing a kind of information processing within the cell, which determines cellular behavior. The basic drivers within cells are concentrations of some proteins, which determine both spatial (location within the cell or tissue) and temporal (cell cycle or developmental stage) coordinates of the cell, as a kind of "cellular memory". The gene networks are only beginning to be understood, and it is a next step for biology to attempt to deduce the functions for each gene "node", to help understand the behavior of the system in increasing levels of complexity, from gene to signaling pathway, cell or tissue level. [5]

Mathematical models of GRNs have been developed to capture the behavior of the system being modeled, and in some cases generate predictions corresponding with experimental observations. In some other cases, models have proven to make accurate novel predictions, which can be tested experimentally, thus suggesting new approaches to explore in an experiment that sometimes wouldn't be considered in the design of the protocol of an experimental laboratory. Modeling techniques include differential equations (ODEs), Boolean networks, Petri nets, Bayesian networks, graphical Gaussian network models, Stochastic, and Process Calculi. [6] Conversely, techniques have been proposed for generating models of GRNs that best explain a set of time series observations. Recently it has been shown that ChIP-seq signal of histone modification are more correlated with transcription factor motifs at promoters in comparison to RNA level. [7] Hence it is proposed that time-series histone modification ChIP-seq could provide more reliable inference of gene-regulatory networks in comparison to methods based on expression levels.

Global feature Edit

Gene regulatory networks are generally thought to be made up of a few highly connected nodes (hubs) and many poorly connected nodes nested within a hierarchical regulatory regime. Thus gene regulatory networks approximate a hierarchical scale free network topology. [8] This is consistent with the view that most genes have limited pleiotropy and operate within regulatory modules. [9] This structure is thought to evolve due to the preferential attachment of duplicated genes to more highly connected genes. [8] Recent work has also shown that natural selection tends to favor networks with sparse connectivity. [10]

There are primarily two ways that networks can evolve, both of which can occur simultaneously. The first is that network topology can be changed by the addition or subtraction of nodes (genes) or parts of the network (modules) may be expressed in different contexts. The Drosophila Hippo signaling pathway provides a good example. The Hippo signaling pathway controls both mitotic growth and post-mitotic cellular differentiation. [11] Recently it was found that the network the Hippo signaling pathway operates in differs between these two functions which in turn changes the behavior of the Hippo signaling pathway. This suggests that the Hippo signaling pathway operates as a conserved regulatory module that can be used for multiple functions depending on context. [11] Thus, changing network topology can allow a conserved module to serve multiple functions and alter the final output of the network. The second way networks can evolve is by changing the strength of interactions between nodes, such as how strongly a transcription factor may bind to a cis-regulatory element. Such variation in strength of network edges has been shown to underlie between species variation in vulva cell fate patterning of Caenorhabditis worms. [12]

Local feature Edit

Another widely cited characteristic of gene regulatory network is their abundance of certain repetitive sub-networks known as network motifs. Network motifs can be regarded as repetitive topological patterns when dividing a big network into small blocks. Previous analysis found several types of motifs that appeared more often in gene regulatory networks than in randomly generated networks. [13] [14] [15] As an example, one such motif is called feed-forward loops, which consist three nodes. This motif is the most abundant among all possible motifs made up of three nodes, as is shown in the gene regulatory networks of fly, nematode, and human. [15]

The enriched motifs have been proposed to follow convergent evolution, suggesting they are "optimal designs" for certain regulatory purposes. [16] For example, modeling shows that feed-forward loops are able to coordinate the change in node A (in terms of concentration and activity) and the expression dynamics of node C, creating different input-output behaviors. [17] [18] The galactose utilization system of E. coli contains a feed-forward loop which accelerates the activation of galactose utilization operon galETK, potentially facilitating the metabolic transition to galactose when glucose is depleted. [19] The feed-forward loop in the arabinose utilization systems of E.coli delays the activation of arabinose catabolism operon and transporters, potentially avoiding unnecessary metabolic transition due to temporary fluctuations in upstream signaling pathways. [20] Similarly in the Wnt signaling pathway of Xenopus, the feed-forward loop acts as a fold-change detector that responses to the fold change, rather than the absolute change, in the level of β-catenin, potentially increasing the resistance to fluctuations in β-catenin levels. [21] Following the convergent evolution hypothesis, the enrichment of feed-forward loops would be an adaptation for fast response and noise resistance. A recent research found that yeast grown in an environment of constant glucose developed mutations in glucose signaling pathways and growth regulation pathway, suggesting regulatory components responding to environmental changes are dispensable under constant environment. [22]

On the other hand, some researchers hypothesize that the enrichment of network motifs is non-adaptive. [23] In other words, gene regulatory networks can evolve to a similar structure without the specific selection on the proposed input-output behavior. Support for this hypothesis often comes from computational simulations. For example, fluctuations in the abundance of feed-forward loops in a model that simulates the evolution of gene regulatory networks by randomly rewiring nodes may suggest that the enrichment of feed-forward loops is a side-effect of evolution. [24] In another model of gene regulator networks evolution, the ratio of the frequencies of gene duplication and gene deletion show great influence on network topology: certain ratios lead to the enrichment of feed-forward loops and create networks that show features of hierarchical scale free networks. De novo evolution of coherent type 1 feed-forward loops has been demonstrated computationally in response to selection for their hypothesized function of filtering out a short spurious signal, supporting adaptive evolution, but for non-idealized noise, a dynamics-based system of feed-forward regulation with different topology was instead favored. [25]

Regulatory networks allow bacteria to adapt to almost every environmental niche on earth. [26] [27] A network of interactions among diverse types of molecules including DNA, RNA, proteins and metabolites, is utilised by the bacteria to achieve regulation of gene expression. In bacteria, the principal function of regulatory networks is to control the response to environmental changes, for example nutritional status and environmental stress. [28] A complex organization of networks permits the microorganism to coordinate and integrate multiple environmental signals. [26]

Coupled ordinary differential equations Edit

where the functions f j > express the dependence of S j > on the concentrations of other substances present in the cell. The functions f j > are ultimately derived from basic principles of chemical kinetics or simple expressions derived from these e.g. Michaelis–Menten enzymatic kinetics. Hence, the functional forms of the f j > are usually chosen as low-order polynomials or Hill functions that serve as an ansatz for the real molecular dynamics. Such models are then studied using the mathematics of nonlinear dynamics. System-specific information, like reaction rate constants and sensitivities, are encoded as constant parameters. [29]

By solving for the fixed point of the system:

for all j , one obtains (possibly several) concentration profiles of proteins and mRNAs that are theoretically sustainable (though not necessarily stable). Steady states of kinetic equations thus correspond to potential cell types, and oscillatory solutions to the above equation to naturally cyclic cell types. Mathematical stability of these attractors can usually be characterized by the sign of higher derivatives at critical points, and then correspond to biochemical stability of the concentration profile. Critical points and bifurcations in the equations correspond to critical cell states in which small state or parameter perturbations could switch the system between one of several stable differentiation fates. Trajectories correspond to the unfolding of biological pathways and transients of the equations to short-term biological events. For a more mathematical discussion, see the articles on nonlinearity, dynamical systems, bifurcation theory, and chaos theory.

Boolean network Edit

The following example illustrates how a Boolean network can model a GRN together with its gene products (the outputs) and the substances from the environment that affect it (the inputs). Stuart Kauffman was amongst the first biologists to use the metaphor of Boolean networks to model genetic regulatory networks. [30] [31]

  1. Each gene, each input, and each output is represented by a node in a directed graph in which there is an arrow from one node to another if and only if there is a causal link between the two nodes.
  2. Each node in the graph can be in one of two states: on or off.
  3. For a gene, "on" corresponds to the gene being expressed for inputs and outputs, "off" corresponds to the substance being present.
  4. Time is viewed as proceeding in discrete steps. At each step, the new state of a node is a Boolean function of the prior states of the nodes with arrows pointing towards it.

The validity of the model can be tested by comparing simulation results with time series observations. A partial validation of a Boolean network model can also come from testing the predicted existence of a yet unknown regulatory connection between two particular transcription factors that each are nodes of the model. [32]

Continuous networks Edit

Continuous network models of GRNs are an extension of the boolean networks described above. Nodes still represent genes and connections between them regulatory influences on gene expression. Genes in biological systems display a continuous range of activity levels and it has been argued that using a continuous representation captures several properties of gene regulatory networks not present in the Boolean model. [33] Formally most of these approaches are similar to an artificial neural network, as inputs to a node are summed up and the result serves as input to a sigmoid function, e.g., [34] but proteins do often control gene expression in a synergistic, i.e. non-linear, way. [35] However, there is now a continuous network model [36] that allows grouping of inputs to a node thus realizing another level of regulation. This model is formally closer to a higher order recurrent neural network. The same model has also been used to mimic the evolution of cellular differentiation [37] and even multicellular morphogenesis. [38]

Stochastic gene networks Edit

Recent experimental results [39] [40] have demonstrated that gene expression is a stochastic process. Thus, many authors are now using the stochastic formalism, after the work by Arkin et al. [41] Works on single gene expression [42] and small synthetic genetic networks, [43] [44] such as the genetic toggle switch of Tim Gardner and Jim Collins, provided additional experimental data on the phenotypic variability and the stochastic nature of gene expression. The first versions of stochastic models of gene expression involved only instantaneous reactions and were driven by the Gillespie algorithm. [45]

Since some processes, such as gene transcription, involve many reactions and could not be correctly modeled as an instantaneous reaction in a single step, it was proposed to model these reactions as single step multiple delayed reactions in order to account for the time it takes for the entire process to be complete. [46]

From here, a set of reactions were proposed [47] that allow generating GRNs. These are then simulated using a modified version of the Gillespie algorithm, that can simulate multiple time delayed reactions (chemical reactions where each of the products is provided a time delay that determines when will it be released in the system as a "finished product").

For example, basic transcription of a gene can be represented by the following single-step reaction (RNAP is the RNA polymerase, RBS is the RNA ribosome binding site, and Pro i is the promoter region of gene i):

Furthermore, there seems to be a trade-off between the noise in gene expression, the speed with which genes can switch, and the metabolic cost associated their functioning. More specifically, for any given level of metabolic cost, there is an optimal trade-off between noise and processing speed and increasing the metabolic cost leads to better speed-noise trade-offs. [48] [49] [50]

A recent work proposed a simulator (SGNSim, Stochastic Gene Networks Simulator), [51] that can model GRNs where transcription and translation are modeled as multiple time delayed events and its dynamics is driven by a stochastic simulation algorithm (SSA) able to deal with multiple time delayed events. The time delays can be drawn from several distributions and the reaction rates from complex functions or from physical parameters. SGNSim can generate ensembles of GRNs within a set of user-defined parameters, such as topology. It can also be used to model specific GRNs and systems of chemical reactions. Genetic perturbations such as gene deletions, gene over-expression, insertions, frame shift mutations can also be modeled as well.

The GRN is created from a graph with the desired topology, imposing in-degree and out-degree distributions. Gene promoter activities are affected by other genes expression products that act as inputs, in the form of monomers or combined into multimers and set as direct or indirect. Next, each direct input is assigned to an operator site and different transcription factors can be allowed, or not, to compete for the same operator site, while indirect inputs are given a target. Finally, a function is assigned to each gene, defining the gene's response to a combination of transcription factors (promoter state). The transfer functions (that is, how genes respond to a combination of inputs) can be assigned to each combination of promoter states as desired.

In other recent work, multiscale models of gene regulatory networks have been developed that focus on synthetic biology applications. Simulations have been used that model all biomolecular interactions in transcription, translation, regulation, and induction of gene regulatory networks, guiding the design of synthetic systems. [52]

Other work has focused on predicting the gene expression levels in a gene regulatory network. The approaches used to model gene regulatory networks have been constrained to be interpretable and, as a result, are generally simplified versions of the network. For example, Boolean networks have been used due to their simplicity and ability to handle noisy data but lose data information by having a binary representation of the genes. Also, artificial neural networks omit using a hidden layer so that they can be interpreted, losing the ability to model higher order correlations in the data. Using a model that is not constrained to be interpretable, a more accurate model can be produced. Being able to predict gene expressions more accurately provides a way to explore how drugs affect a system of genes as well as for finding which genes are interrelated in a process. This has been encouraged by the DREAM competition [53] which promotes a competition for the best prediction algorithms. [54] Some other recent work has used artificial neural networks with a hidden layer. [55]

Multiple sclerosis Edit

There are three classes of multiple sclerosis: relapsing-remitting (RRMS), primary progressive (PPMS) and secondary progressive (SPMS). Gene regulatory network (GRN) plays a vital role to understand the disease mechanism across these three different multiple sclerosis classes. [56]


Methods

Microarray data used in this study were obtained from the Gene Expression Omnibus (GEO) database at NCBI by Nov. 2 nd of 2009. GEO series with accession numbers GSE2361[4], GSE1133[6](2004 version of the Gene Atlas) and GSE7307[31] (the "human body index") were used to find molecular features in normal tissues and to derive the 56-gene template profiles. (Additional file 1: Table S1) Datasets GSE14334, GSE3204, GSE5364 and GSE6932 were used as testing data to further explore the biological implications of GETs. Datasets GSE1133, GSE2361, GSE5364 and GSE6932 were hybridized on the Affymetrix GeneChip HG-U133A and GSE7307 on the HG-U133plus2.0. The Affymetrix GeneChip HG-U133plus2.0 contained 54,675 probe sets (representing around 38,572 unique UniGene clusters) which cover all the 22283 probe sets (representing 14,593 unique UniGene clusters) synthesized on the HG-U133A. The additional 62 datasets used for large-scale tissue prediction had all been hybridized on either HG-U133A or HG-U133plus2.0. The accession identification as well as the associated information are summarized in Additional file 1: Tables S1 and S3.

Molecular annotation for selected genes

The gene sets were annotated by searching the databases at the DAVID server (http://david.abcc.ncifcrf.gov/home.jsp) with Entrez Gene [32] identifier as input. Cellular location and biological processes were searched against Gene Ontology (GO) [33]. The molecular functions were searched against PANTHER[34], since PANTHER gave a more complete set of biologically-relevant results for our gene set than GO. Pathways were searched against KEGG [35].

Microarray Analysis

For those datasets whose CEL files are available at GEO, the data were first subjected to quality assessment by AffyQualityReport to remove the poor quality arrays and then to RMA[36] processing for data normalization.

For identification of the 56 signature genes, this preprocessing procedure resulted in 143, 35 and 473 arrays for GSE1133, GSE2361, and GSE7307, respectively. Gene filtration was carried out by firstly selecting from each of the three training datasets the genes whose coefficients of variation ranked at top 2.5% of the entire transcriptome across different tissue types. The resulted highly variably expressed genes were then intersected to generate a set of candidate tissue-classifier genes which were later subjected to data redundancy elimination through hierarchical clustering against the 24 tissues commonly present in the three sets of training data. Following the hierarchical cluster analysis, one representative gene for each cluster was selected and additional genes with highly similar expression profiles got removed. This procedure resulted in 56 genes.

For tissue classification, the probe set intensities of the 56 genes or an equivalent number of random probe sets of the 24 selected tissues were extracted from each of the three GEO datasets using the programs Microsoft Access and Excel. The extracted probe intensities from the three datasets were then combined into a 56 × 72 matrix which was then subjected to hierarchical clustering with the GenePattern package [37] using Pearson correlation for similarity computing and average for clustering. Ten sets of 56 random probe sets were produced by a random number generation program written in C. Each set was used for a separate hierarchical clustering analysis.

Both AffyQualityReport and RMA were obtained from the Bioconductor package [38] in the R package (http://www.r-project.org/). Descriptive statistical analyses were computed using Excel while hierarchical clustering with the GenePattern package.

Tissue prediction using the 56 genes

Tissue prediction was performed following the KNN method (k-nearest neighbor) with k = 1. It compares the c.f. of the 56-gene profiles between a test tissue and each of our 24 tissue-specific GET profiles, one for each tissue type. The tissue type with highest correlation was nominated as our prediction. A computer program in R language was implemented to accomplish this task.

Dataset retrieval from GEO for large-scale tissue-prediction

Text The entire GEO database (2009-11-2 freeze) was searched with the following criteria: platform as GPL96 (Affymetrix HG-U133A) or GPL570 (HG-U133plus2.0), sample source containing one of the 24 distinguishable human organ/tissues and key word in the sample-related fields containing "normal". Two bioinformatics strategies were used to carry out the search: one was to apply SQL commands to the local MySQL database housing the data from the soft files of GPL96 and GPL570 which were imported from GEO website. The other strategy was to directly query the GEO database with Entrez keywords through the NCBI web interface. The union of both searching results was taken, followed by manual filtration to exclude irrelevant datasets that, for example, came from cell lines or specific cell types. Those datasets which had been contributed by the same research group as the three source datasets, GSE3526 for instance, were also removed from our test set. Expression profiles of the 56 genes were then extracted from the 61 resulting datasets.

Datasets of 56 gene expression values were organized into RMA-like or MAS-like according to the data preprocessing methods. For those datasets that had been normalized with MAS5 or equivalent method, logarithmic transformation was carried out prior to tissue-prediction analysis. For three datasets (GSE13355, GSE14951, GSE17539) it was hard to judge whether logarithm transformation was necessary and their CEL files were therefore preprocessed with AffyQualityReport followed by RMA normalization before tissue-prediction analysis.

Gene network construction

Gene networks were constructed with the MetaCore package using the algorithms "network analysis" and "receptor targets modeling". The algorithms are variants of the shortest paths algorithm where the main parameters are: 1) relative enrichment with the uploaded data (the 56 genes in this study), and 2) relative saturation of networks with canonical pathways. As a control for this network analysis, a set of 56 genes randomly selected from the Affymetrix microarray HG-U133A was entered as a query and no network was produced by either of the algorithms. The control experiments were repeated twice.



Comments:

  1. Evoy

    You, casually, not the expert?

  2. Reaghan

    This excellent idea, by the way, just falls

  3. Baruti

    In this something is excellent idea, we maintain.

  4. Ceolfrith

    Authoritative message :), curious ...



Write a message