ACD 3/23/04 Files to accompany "Driskell et al." submitted March 2004 *Sequence data files contain information on all of the sequences we analyzed. -Each line of these two files corresponds to a sequence, and there are six tab-delimited fields for each sequence. -Field 1 is "fakegi" a unique identifying number assigned to each sequence in a dataset. (Numbers may overlap between datasets). -Field 2 is "nucgi" (gbpln) or "accid" (sprot). These numbers are the identifying numbers in the Genbank and Swiss-Prot databases, respectively. -Field 3 is "taxid" and is the NCBI taxonomy id number attached to the sequence. -Field 4 is "clustid" and refers to the cluster in which the sequence was parsed by BLASTCLUST with a 60% minimum sequence similarity. (Cluster numbers overlap between datasets). -Fields 5 & 6 designate the cluster into which the sequence was placed as "informative/NONinformative" (Field 5). If a cluster contains sequence from more than three taxa, it was classed as informative. The last field (6) designates whether the cluster passed our orthology test ("single_copy") or not ("mixed") or was "not_tested" (if NONinformative). *Supermatrix files are the nexus files of the two supermatrices analyzed in this paper. The data is interleaved and grouped by cluster within each nexus file. -each sequence is preceeded by a "name" consisting of the sequence "fakegi" (see above), an underscore, and the taxid -Following the data matrix is a list of taxa and the clusters for which they have data. -Following this is a set of TAXSET statements, one for each cluster, listing the taxa which DO NOT have data for that cluster. -Lastly there is a set of CHARSET statements, one for each cluster, delineating the characters in the alignment for that cluster. -The format line for the green plant supermatrix is constructed so that the assumptions block from a protein parsimony step matrix can be used, although the step matrix is not included in this file. It is, however, also available for download. *Protein parsimony step matrix is the step matrix that was applied to the green plant supermatrix for parsimony analysis. *The two *CLUSTERS.tar.gz files contain all aligned nexus files for all informative clusters. -these files are zipped and tarred. To decompress and detar simultaneously, the command is: gzip -dv | tar -xvf -Within each set, individual nexus files are labelled as gbpln*.nex (sprot*.nex), where * corresponds to the cluster id number. -For the green plants, Clusters 1-3, and 5 were NOT fully aligned, but instead the sequences necessary to build the supermatrix were extracted from the clusters and aligned separately. These partial cluster files are labelled as "gbpln*.subset.nex"