HOMEcisPath


An R/Bioconductor package for cloud users for visualization and management of PPI networks

With the burgeoning development of cloud technology and services, there are an increasing number of users who prefer cloud to run their applications. All software and associated data are hosted on the cloud, allowing users to access them via a web browser from any computer, anywhere. This website presents cisPath, an R/Bioconductor package deployed on cloud servers for client users to visualize, manage, and share functional protein interaction networks.

Authors Likun Wang, Yuxin Yin
Maintainer Likun Wang <wanglk at hsc.pku.edu.cn>

Download

How to use the cisPath package  cisPath.pdf
Easy editor for networks  easyEditor, easyEditor.tar.gz
Download the cisPath package  cisPath@Bioconductor
R scripts used to format data  scripts.tar.gz
Supplemental materials (new)  supplement.pdf

Note: In the R source package, we selected only a small portion of the available protein-protein interaction (PPI) data as an example. The file that contains the complete PPI information can be downloaded by the following links.

We have established an RStudio server using the free usage tier of the Amazon Elastic Compute Cloud (EC2) where several examples are presented for testing this package. Users can run the examples by logging on via a web browser. If the address changes for unforeseeable reasons and cannot be accessed, please refer to the following link and download the latest version of this document: supplement.pdf.

Organisms:

Formatted PPI data:  Homo_sapiens_PPI.txt
Result Statistics:  Homo sapiens

Data source:

Database  URL
UniProtKB/Swiss-Prot  uniprot_sprot_human.dat.gz
UniProtKB/TrEMBL  uniprot_trembl_human.dat.gz
STRING  9606.protein.links.v9.1.txt.gz
PINA  Homo sapiens
iRefIndex  9606.mitab.08122013.txt.zip
Formatted PPI data:  Saccharomyces_cerevisiae_PPI.txt
Result Statistics:  Saccharomyces cerevisiae

Data source:

Database  URL
UniProtKB/Swiss-Prot  uniprot_sprot_fungi.dat.gz
UniProtKB/TrEMBL  uniprot_trembl_fungi.dat.gz
STRING  4932.protein.links.v9.1.txt.gz
PINA  Saccharomyces cerevisiae
iRefIndex  4932.mitab.08122013.txt.zip
Formatted PPI data:  Caenorhabditis_elegans_PPI.txt
Result Statistics:  Caenorhabditis elegans

Data source:

Database  URL
UniProtKB/Swiss-Prot  uniprot_sprot_invertebrates.dat.gz
UniProtKB/TrEMBL  uniprot_trembl_invertebrates.dat.gz
STRING  6239.protein.links.v9.1.txt.gz
PINA  Caenorhabditis elegans
iRefIndex  6239.mitab.08122013.txt.zip
Formatted PPI data:  Drosophila_melanogaster_PPI.txt
Result Statistics:  Drosophila melanogaster

Data source:

Database  URL
UniProtKB/Swiss-Prot  uniprot_sprot_invertebrates.dat.gz
UniProtKB/TrEMBL  uniprot_trembl_invertebrates.dat.gz
STRING  7227.protein.links.v9.1.txt.gz
PINA  Drosophila melanogaster
iRefIndex  7227.mitab.08122013.txt.zip
Formatted PPI data:  Mus_musculus_PPI.txt
Result Statistics:  Mus musculus

Data source:

Database  URL
UniProtKB/Swiss-Prot  uniprot_sprot_rodents.dat.gz
UniProtKB/TrEMBL  uniprot_trembl_rodents.dat.gz
STRING  10090.protein.links.v9.1.txt.gz
PINA  Mus musculus
iRefIndex  10090.mitab.08122013.txt.zip
Formatted PPI data:  Rattus_norvegicus_PPI.txt
Result Statistics:  Rattus norvegicus

Data source:

Database  URL
UniProtKB/Swiss-Prot  uniprot_sprot_rodents.dat.gz
UniProtKB/TrEMBL  uniprot_trembl_rodents.dat.gz
STRING  10116.protein.links.v9.1.txt.gz
PINA  Rattus norvegicus
iRefIndex  10116.mitab.08122013.txt.zip

Note: As examples, we generated PPI data for several species from PINA, iRefIndex, and STRING. Users can download the R scripts that we used to format the PPI data (scripts.tar.gz). The above PPI files will be accepted by this package. The interactions with confidence scores less than 700 have been filtered out (score < 700: low and medium confidence). Users should cite these original database(s) if these files are used for publication.


Amazon Web Services

Amazon Web Services (AWS) has introduced a free usage tier for new customers of AWS, which can be used to test the cisPath package, and to build and share personal PPI databases. Louis Aslett has provided various kinds of Amazon Machine Images (AMIs) which make deploying an RStudio Server very fast and easy. These AMIs are highly recommended, especially for free micro instance users. The Bioconductor Team has also developed an Amazon Machine Image (AMI) optimized for running Bioconductor packages on the Amazon Elastic Compute Cloud. Instructions on how to launch the AMI can be found at here.

Google Drive

Other than cloud computing servers, users can choose to use cloud drivers as an alternative. In this case, the user uploads the results to the cloud driver, which can then be visualized and shared with colleagues via browser. Instructions on how to host webpages with Google Drive can be found at here.

Examples

Quick test (cisPath>=1.4.6)
How to use the formatted PPI data (cisPath>=1.4.6)


# Note: If R < 3.0 is used, please download the package from Bioconductor and install manually. Close
# To install this package, start R (>=3.0) and enter:
source("http://bioconductor.org/biocLite.R")
biocLite("cisPath")

# Run the toy examples (~30 seconds)
library(cisPath)
# examples
infoFile <- system.file("extdata", "PPI_Info.txt", package="cisPath")
outputDir <- file.path(getwd(), "TP53_example")

# source protein: TP53
# Identify all shortest paths from TP53 to other proteins
results <- cisPath(infoFile, outputDir, "TP53")
results

# networkView example1
outputDir <- file.path(getwd(), "networkView")
networkView(infoFile, c("MAGI1","TP53BP2","TP53", "PTEN"), outputDir, FALSE, c(1,1,1,0), displayMore=TRUE)

# networkView example2
outputDir2 <- file.path(getwd(), "networkView2")
inputFile <- system.file("extdata", "networkView.txt", package="cisPath")
rt <- read.table(inputFile, sep=",", comment.char="", header=TRUE)
proteins <- as.vector(rt[,1])
sizes <- as.vector(rt[,2])
cols <- as.vector(rt[,3])
networkView(infoFile, proteins, outputDir2, FALSE, sizes, cols, displayMore=FALSE)



Close # Note: If R < 3.0 is used, please download the package from Bioconductor and install manually.
# To install this package, start R (>=3.0) and enter:
source("http://bioconductor.org/biocLite.R")
biocLite("cisPath")
library(cisPath)

# Download the sample PPI data that we have generated (may take several minutes)
dataDir <- file.path(getwd(), "PPIDATA")
dir.create(dataDir, showWarnings=FALSE, recursive=TRUE)
infoFile <- file.path(dataDir, "PPIdata.txt")
download.file("http://sourceforge.net/projects/cispath/files/Homo_sapiens_PPI.txt/download", infoFile)

# (1)Display a list of given proteins in a PPI network (less than 1 minute)
outputDir <- file.path(getwd(), "networkView")
networkView(infoFile, c("TP53","GH1","MAGI1","IGF1","TFAP2A"), outputDir)

# (2)Identify the shortest path from TP53 to MAGI1 and GH1 (about 3 minutes)
outputDir <- file.path(getwd(), "MAGI1_GH1")
cisPath(infoFile, outputDir, "TP53", c("MAGI1", "GH1"))

# (3)Identify all shortest paths from TP53 to other proteins (about 10 minutes)
outputDir <- file.path(getwd(), "TP53_ALL")
cisPath(infoFile, outputDir, "TP53", byStep=TRUE)

# (4)Identify all shortest paths from PTEN to other proteins (about 10 minutes)
outputDir <- file.path(getwd(), "PTEN_ALL")
cisPath(infoFile, outputDir, "PTEN", byStep=FALSE)

# (5)Open the cisPath web page (about 1 minute)
outputDir <- file.path(getwd(), "cisPathWeb")
cisPath(infoFile, outputDir)


Potential users can click here (TP53 and PTEN) to see typical results. One of the following browser versions or higher is required to access RStudio: Firefox 3.5, Safari 4.0 or Google Chrome 5.0. However, Opera and IE10 browsers will also properly display the results. Please contact us if the paths do not display correctly.


Reference

UniProt database (version released 6/11/2014)
Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 2013, 41(Database issue):D43-47.

PINA database (version released 5/21/2014)
Cowley MJ, Pinese M, Kassahn KS, Waddell N, Pearson JV, Grimmond SM, Biankin AV, Hautaniemi S, Wu J: PINA v2.0: mining interactome modules. Nucleic Acids Res 2012, 40(Database issue):D862-865.

iRefIndex database (v13.0 released 12/9/2013)
Razick S, Magklaras G, Donaldson IM: iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 2008, 9:405.

STRING database (v9.1)
Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C et al: STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 2013, 41(Database issue):D808-815.