The pipeline here was developed as part of teaching NESCent Phyloinformatics course, July 2007. Jason Stajich jason_stajich __AT__ berkeley-DOT-EDU Directories: genomes - input genome annotations in genbank file format Workflow: 1. get_cds.pl - extract coding and protein sequence from genbank files to 'cds' and 'protein' directories 2. run_blast.pl - run WU-BLASTP on the proteins files to generate all-vs-all pairwise 3. make_clusterortho_files.pl - make orthologous clusters by single-linkage insuring single copy. 4. make_clusterfamily_files.pl - make paralogous families by single-linkage 5. make_cluster_files.pl - make sequence fasta files from cluster data. run on either the ortholog or gene family cluster files (two columns, 1st is cluster ID, second is gene id) 6. make_alignments.pl - Run a MSA alignment tool on the cluster fasta files 7. make_cdsaln_from_pepaln.pl - make CDS alignments from protein guide alignments 8. make_njtree.pl - Make an NJ tree and PS image using built in BioPerl module on an alignment file.