Strategies for Mining SNPs from EST data

1.Big problems

a.EST sequences always are poor quality (single-pass sequencing)
b.Public EST data without trace files and quality files
c.No original resource information to identify accession
d.ESTs are short and difficult to classify orthologs or paralogs, moreover homozygous and heterozygous sequences.
e.It’s difficult  to find nsSNP(non synonymous SNP) from EST
f.Difficult to detect low frequencies SNP

2.Strategies

a. Get all information for all EST data of potato or brassic from EMBL Database, And then exact sequence,culitvar and tissue from them

b. Align these data by Cap3 with Cross_match removing vectors

C.Get alignment information for analysis SNP

d. Analyze these information and identify true SNP

e. Get nsSNP according blastx and fasty information

3. Workframe