Accurate Genome-Wide Survival Analysis
Current implementations of the log-rank test (R survdiff, SAS LIFETEST, etc.) are based on an asymptotic approximation for the distribution of the log-rank statistic that is not appropriate when the two populations to be compared are unbalanced, as it is the case when testing the association of a mutation with survival in genomic studies. This asymptotic approximation results in p-values that can be very different from the exact p-values, up to 7 orders of magnitude, and a large number of false discoveries are reported because of this difference. We have designed and implemented a method, now called ExaLT (Exact Log-rank Test) to compute a conservative approximation of the exact p-value. In particular, our method computes the p-value for the exact permutational p-value, that is more appropriate for testing the association of mutations with survival.
A different p-value can be computed using a different null distribution (called conditional); while we suggest to compute the p-value from the permutational distribution (with the code above), we note that efficient implementations to compute the exact p-value from the conditional distribution are not available, and provide such an implementation in Matlab below:
For more information, contact Fabio Vandin at vandinfa [at] cs.brown.edu.
If you use our method in your research, please cite:
F. Vandin, A. Papoutsaki, B.J. Raphael*, E.Upfal*. (2013) Genome-Wide Survival Analysis of Somatic Mutations in Cancer. 17th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2013). [Best Paper Award, RECOMB 2013] [Publisher Link]
HotNet2: Network Analysis of Mutation DataSee also: HotNet project page
HotNet2 is an algorithm for the discovery of significantly mutated subnetworks in a protein-protein interaction network.
HotNet2 uses an insulated heat diffusion model to simultaneously analyze both the mutations in and local topology of
sets of proteins.
We describe HotNet2 in a paper in submission.
The pre-release of HotNet2 will be available soon. For more information or to become a beta-tester, contact Max Leiserson at mdml [at] cs.brown.edu.
Multi-Dendrix: (Multiple Pathway De novo Driver Exclusivity)Multi-Dendrix project page
Multi-Dendrix is an algorithm for the
M.D.M. Leiserson, D. Blokh, R. Sharan, B.J. Raphael. (2012) Simultaneous identifcation of multiple driver pathways in cancer. [In submission]
We have released Multi-Dendrix as a Python package that includes functions for subtype and network analysis of Multi-Dendrix results.
Download the release on GitHub: Multi-Dendrix (Version 1.0, January 28, 2013)
Dendrix: (De novo Driver Exclusivity)Dendrix project page Dendrix web server
Dendrix is an algorithm for discovery of mutated driver pathways in cancer using only mutation data. It finds sets of genes, domains, or nucleotides whose mutations exhibit both high coverage and high exclusivity in the analyzed samples. This algorithm is described in the paper:
F. Vandin, E. Upfal, B.J. Raphael. (2012) De novo Discovery of Mutated Driver Pathways in Cancer. \Genome Research. 22(2):375-85. Epub 2011 Jun 7. PDF Preprint Publisher Link [Preliminary version accepted at 15th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2011)]To download Dendrix see the Dendrix project page
HotNet: Finding Altered SubnetworksHotnet project page
HotNet is an algorithm for finding significanlty altered subnetworks in a large gene interaction network. This algorithm is described in the paper:
Vandin F, Upfal E, B.J. Raphael. (2011) Algorithms for Detecting Significantly Mutated Pathways in Cancer. Journal of Computational Biology. 18(3):507-22.
[A preliminary version of the paper appeared at Proceedings of the 14th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2010). [PDF] ]To download HotNet see the Hotnet project page
HotNet and Dendrix Visualization (Cytoscape plug-in)
A Cytoscape plug-in for viewing HotNet and Dendrix results.
NBC: Neighborhood Breakpoint Conservation
This software finds recurrent rearrangement breakpoints in DNA copy number data. The algorithm is described in the paper:
A. Ritz, P.L. Paris, M.M. Ittmann, C. Collins, and B.J. Raphael. (2011) Detection of Recurrent Rearrangement Breakpoints from Copy Number Data. BMC Bioinformatics. Publisher Link
Gremlin: Genome Rearrangement Explorer with Multi-Scale, Linked Interactions:
This is an interactive visualization model for the comparative analysis of structural variation in human and cancer genomes. The model is described in the following paper:
T.M. O'Brien, A. Ritz, B.J. Raphael, and D.H. Laidlaw. (2010) Gremlin: An Interactive Visualization Model for Analyzing Genomic Rearrangements. IEEE Transactions on Visualization and Computer Graphics. vol.16, no.6, pp.918-926. Publisher Link
- Description of the visualization views: GremlinOverview.pdf.
- Description of the visualization interaction: GremlinInteraction.pdf.
- Demo Gremlin Here
- Coming Soon: Download Gremlin
Geometric Analysis of Structural Variants (GASV and GASVPro)
Software for analysis of structural variation from paired-end sequencing and/or array-CGH data. This software has been tested used to find structural variation in both normal and cancer genomes using data from a variety of next-generation sequencing platforms. It can be used to predict structural variants directly from aligned reads in SAM/BAM format.
GASVPro is a probabilistic version of our original GASV algorithm. GASVPro combines read depth information along with discordant paired-read mappings into a single probabilistic model two common signals of structural variation. When multiple alignments of a read are given, GASVPro utilizes a Markov Chain Monte Carlo procedure to sample over the space of possible alignments.
The GASVPro algorithm is described in the following paper.
S. Sindi, S. Onal, L. Peng, H. Wu and B.J. Raphael. (2012) An Integrative Model for Identification of Structural Variation in Sequencing Data. Genome Biology (In Press)
The original GASV method is described in the following paper:
S. Sindi, E. Helman, A. Bashir, B.J. Raphael. (2009) A Geometric Approach for
Classification and Comparison of Structural Variants.Bioinformatics. 25: i222-i230. (Special issue for the Joint 17th Annual International Conference on Intelligent Systems in Molecular Biology and 8th Annual International European Conference on Computational Biology (ISMB/ECCB 09)). Publisher Link
Old versions. These are for archival purposes. It is recommended to download the latest version from link above.
- Version 1.4 (3/5/2010) . Download
- Version 1.3 (1/19/2010) . Download
- Example BAM file
- Version 1.2 (11/30/2009) . Download: software
- New in Version 1.4: Release notes.
- New in Version 1.3: New output formats, streamlining of BAM file handling, bug fixes.
- New in Version 1.2 (11/30/2009): Improved handling of SAM/BAM alignment files, speed improvements, maxCliqueSize option.
- New in Version 1.1: a preprocessor for SAM/BAM files, aCGH comparison, fusion gene detection, and more.
Motif Description Length (MoDL):
MoDL finds mutliple motifs in a set of phosphorylated peptides, and is described in the following paper:
A. Ritz, G. Shakhnarovich, A.R. Salomon, and B. Raphael. Discovery of Phosphorylation Motif Mixtures in Phosphoproteomics Data. (2009) Bioinformatics. 25(1):14-21. Publisher Link
Paired-End Reconstruction of Genome Organization (PREGO):Structural Variation Project Page
This algorithm reconstructs a cancer genome as a rearrangement of segments, or intervals, from the reference genome using paired end sequencing data. The algorithm is described in the following paper:
L. Oesper, A. Ritz, S.J. Aerni, R. Drebin, and B.J. Raphael. (2012) Reconstructing cancer genomes from paired-end sequencing data. BMC Bioinformatics. 13(Suppl 6):S10. Publisher Link.
[Preliminary version accepted at 2nd Annual RECOMB Satellite Workshop on Massively Parallel Sequencing (RECOMB-seq)]
CURRENT RELEASE: PREGO Version 1.2 (5/29/2013) Download
Old versions. These are for archival purposes. It is recommended to download the latest version from the link above.
Tumor Heterogeneity Analysis (THetA)THetA Project Page
This algorithm estimates tumor purity and clonal/subclonal copy number aberrations directly from high-throughput DNA sequencing data.
We describe this algorithm in the following paper:
L. Oesper, A. Mahmoody, and B.J. Raphael. (2013) THetA: Inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biology. 14:R80. [Publisher Link] [Supplemental Material]