Help


Datasets
The Bacteriome resource contains three datasets detailing interactions between sets of E. coli proteins: a predicted network of functional interactions and two experimentally derived networks generated in-house through a comprehensive series of high quality affinity tag pull down experiments.

Functional interaction dataset 1927 genes - 3989 interactions
The functional interaction dataset is a theoretical network of functional linkages obtained through integrating a variety of proteomic and genomic datasets:
Experimental protein-protein interaction (PPI) data. Sources include the Database of Interacting Proteins (DIP) [1] and a recent large scale pulldown analysis of E. coli proteins [2].
Genome context data. Sources include conserved coexpression data [3], a variety of datasets from the Prolinks database [4] including Phylogenetic Profiles (based on the the presence and absence of proteins across multiple genomes); Gene Clusters (based on genome proximity); Rosetta Stone (based on gene fusion events); and Gene Neighbors (based on both gene proximity and phylogenetic distribution). Each dataset was treated independently and the confidence scores associated to each functional linkage provided by Prolinks was not incorporated in our study.
Interolog data. We used the Interologs approach [5] to exploit the interactions obtained for Helicobacter pylori to derive additional predictions of E. coli interactions
Literature-mining data. Literature-mined protein interactions (co-citations) for E. coli was obtained by querying the iHOP (Information Hyperlinked Over Proteins) database [6].

In brief, we integrated the information from the above multiple functional genomics datasets using a naive Bayesian approach with weights reflecting the relative confidence associated with each input data set. Each experimental and computational data set was evaluated for its ability to reconstruct known pathways by measuring log likelihood scores (LLS) representing the likelihood that a pair of genes are functionally linked. LLS greater than zero indicate that the input data sets tend to link genes participating in the same pathway, with higher scores indicating more confident linkages. LLS is thus proportional to the accuracy of the different experiments (input datasets) and their ability to inform about biological pathways. Finally, the likelihood of a pair of genes or products being functionally linked is thus an estimate of the combined strength of linkages across all data sets.


Figure 1. Accuracy and extent of E. coli genomics data sets and the integrated network. The x-axis indicates the % of linked-proteins analysed across all datasets. The values on the y-axis indicate the relative accuracy of each dataset measured on the basis of EcoCyc functional associations. The red star indicates the accuracy score derived from our gold standard of small scale pull down experiments obtained from DIP. Other datasets are indicated as individual points on the graph. The integrated network of functional interactions derived as above is shown as a purple line (interactions with LLS higher than small scale pull down experiments - the "high quality" network presented by Bacteriome) and a black line (interactions with LLS lower than the small scale pull down experiments).

Experimental datasets
Core - 1036 genes - 4092 interactions; Extended 2291 genes 7220 interactions
The experimental datasets were obtained from a series of in-house TAP-TAG pull down experiments following our original publication [7]. In this latest set of experiments, 'purification enrichment' (PE) scores were calculated as employed by Collins and co-workers [8] to provide a measure of confidence associated with each interaction. The core set was obtained using a PE score cutoff of 0.9, while the extended set of interactions was obtained with a PE cutoff score of 0.8. A manuscript providing further details of these datasets is currently being prepared.

[1] Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D. (2000) DIP: the database of interacting proteins. Nucleic Acids Res. 28:289-91.
[2] Arifuzzaman M, Maeda M, Itoh A, Nishikata K, Takita C, Saito R, Ara T, Nakahigashi K, Huang HC, Hirai A, Tsuzuki K, Nakamura S, Altaf-Ul-Amin M, Oshima T, Baba T, Yamamoto N, Kawamura T, Ioka-Nakamichi T, Kitagawa M, Tomita M, Kanaya S, Wada C, Mori H. (2006) Large-scale identification of protein-protein interaction of Escherichia coli K-12. Genome Res. 16:686-91.
[3] Bergmann S, Ihmels J, Barkai N. (2004) Similarities and differences in genome-wide expression data of six organisms. PLoS Biol. 2:E9.
[4] Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D. (2004) Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol. 5(5):R35.
[5] Yu H, Luscombe NM, Lu HX, Zhu X, Xia Y, Han JD, Bertin N, Chung S, Vidal M, Gerstein M. (2004) Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res. 14:1107-18.
[6] Hoffmann R, Valencia A. (2005) Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics. 21 Suppl 2:ii252-ii258.
[7] Butland G, Peregrin-Alvarez JM, Li J, Yang W, Yang X, Canadien V, Starostine A, Richards D, Beattie B, Krogan N, Davey M, Parkinson J, Greenblatt J, Emili A. (2005) Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature. 433:531-7.
[8] Collins SR, Kemmeren P, Zhao XC, Greenblatt JF, Spencer F, Holstege FC, Weissman JS, Krogan NJ. (2007) Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol Cell Proteomics. 6:439-50.

Help Navigation
Help Content
Help Content

Edit
pass