I'm not going to provide the whole data dir as a tarball as you can download these from the original sites much faster, so they will have size 0 in the tarball. These include lib/ensembl.pep (ensembl confirmed protein set) ftp://ftp.ensembl.org/pub/current/data/fasta/pep/ensembl.pep.gz problemdata/problem6/yeast.aa (yeast protein seqs) ftp://ftp.ncbi.nlm.nih.gov/pub/blast/db/yeast.aa.Z problemdata/problem6/gbvrl1 (genbank virus division release) ftp://ftp.ncbi.nlm.nih.gov/genbank/gbvrl1.seq.gz problemdata/project2/drosophila_mrna.ffn ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/D_melanogaster/Scaffolds/mRNA/mrna.fa problemdata/project2/a_fulgidus_NC_000917.ffn (predicted ORFs) ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Archaeoglobus_fulgidus/CURATED/NC_000917.ffn problemdata/project2/s_pneumoniae_AE005672.ffn (predicted ORFs) ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Streptococcus_pneumonia_R6/AE005672.ffn