Sanderson Lab

 Data sets from published studies

 

Data sets to accompany manuscripts in review are HERE

 

All sequence data sets below are in Nexus format:


Sanderson, M. J., and M. M. McMahon. 2007. Inferring angiosperm phylogeny from EST data with widespread gene duplication. BMC Evolutionary Biology, 7 (suppl. 1): S3.

Data files:
Species tree--gene tree file (maximum parsimony gene tree collection) (uncompressed file)
Species tree--gene tree file (maximum likelihood gene tree collection) (uncompressed file)


McMahon, M. M., and M. J. Sanderson. Phylogenetic Supermatrix Analysis of GenBank Sequences from 2228 Papilionoid Legumes. Systematic Biology, 55:818-836.

Data files:
Dense supermatrix file (uncompressed file, 76 MB)
Dense supermatrix file (compressed .Z file, 2.4 MB)
Sparse supermatrix file (uncompressed file, 100 MB)
Sparse supermatrix file (compressed .Z file, 2.4 MB)
Please note that the uncompressed versions may not display in entirety, depending on the browser.

Tree files in Nexus format:
Dense supermatrix, 5000 equally parsimonious trees (compressed .gz file, 0.7 MB)
Dense supermatrix, 5000 equally parsimonious trees (uncompressed nexus file, 72 MB)
Dense supermatrix, strict consensus of 5000 equally parsimonious trees

Driskell, A. C., C. Ané, J. G. Burleigh, M. M. McMahon, B. C. O'Meara, and M. J. Sanderson. 2004. Phylogenetic utility of large sequence databases for building the tree of life. Science 306: 1172-4.

README regarding data files

Protein parsimony step matrix

Swiss-Prot data used in the analyses [5.8 MB file]

GenBank green plant data used in the analyses [9.0 MB file]

Swiss-Prot metazoan supermatrix [35 MB file]

GenBank green plant supermatrix [7.2 MB file]

Aligned nexus files for all informative Swiss-prot clusters [4.7 MB tar.gz file]

Aligned nexus files for all informative green plant clusters [2.9 MB tar.gz file]


Wojciechowski, M. F., M. J. Sanderson and J-M. Hu. 1999. Evidence on the monophyly of Astragalus (Fabaceae) and its major subgroups based on nuclear ribosomal DNA ITS and chloroplast DNA trnL intron data. Syst. Bot. 24:409-437.

ITS


Sanderson, M. J., M. F. Wojciechowski, J.-M. Hu, T. Sher Khan, and S. G. Brady. 2000, Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17:782-797.

psaA

psbB


Sanderson, M. J. and M. F. Wojciechowski. 2000. Improved bootstrap confidence limits in large-scale phylogenies, with an example from Neo-Astragalus (Leguminosae). Syst. Biol. 49:671-685.

ITS


Sanderson, M. J., and J. A. Doyle. 2001. Sources of error and confidence intervals in estimating the age of angiosperms from rbcL and 18S rDNA data. American Journal of Botany 88:1499-1516.

rbcL and 18S combined data set

species used and sources of data (PDF File)


Sanderson, M. J. 2003. Molecular data from 27 proteins do not support a Precambrian origin of land plants. Amer. J. Bot. 90:954-956.

27-protein data set


Sanderson, M. J., A. C. Driskell, R. H. Ree, O. Eulenstein, and S. Langley. 2003. Obtaining maximal concatenated phylogenetic data sets from large sequence databases. Mol. Biol. Evol. 20:1036-1042.

15-protein,15 taxa data set

39 protein, 10-taxa data set


Ané, C., J. G. Burleigh, M. M. McMahon and M. J. Sanderson. 2005. Covarion structure in the plastid genome evolution: A new statistical test. Molecular Biology and Evolution 22: 914-924.

57 chloroplast alignments used in this paper, with a table that tranlsates our cluster numbers to the gene symbol and product as given in the GenBank files. Here is a perl script that aligns DNA sequences from an existing alignement of amino acid sequences.

programs implementing the test

Seq-gen-cov: a modification of Seq-Gen. C program that simulates DNA sequences under a variety of models, including covarion models. (.gz file)


Yan, C., J. G. Burleigh, and O. Eulenstein. 2005. Identifying optimal incomplete phylogenetic data sets from sequence databases. Molecular Phylogenetics and Evolution 35: 528-535.

HERE is a zipped file containing scripts and an example data file to implement an alpha-quasi-biclique search.