Algorithm and Methodology for detection of reliable SNPsThree filters are used to detect reliable SNPs: Filter 1 screens clusters for potential SNPs and differentiates variation in between or within genotypes; Filter 2 detects clusters containing variation caused by sequencing errors and paralogous sequences; Filter 3 detects unreliable SNPs by assigning confidence scores to SNPs based on sequence redundancy and sequence quality. Filter 2 is the core part, in the filter, haplotypes are defined on the potential SNPs.
[1]
Clusters with potential paralogous sequences are detected following the method: (1) Remove all haplotypes consisting of only one sequence: these are probably of poor quality. This is in accordance with Jalving et al. (2004) who recommend removing poor quality sequences before SNP detection.
(2) Calculate the number of potential
SNPs defined in every haplotype.
(3) Normalize the number of SNPs per haplotype:
(4)Calculate the standard deviation of the normalized number of potential SNPs among these haplotypes:
[3] |
| The source code of QualitySNP is freely available for users. You can download it from here ; the manual can be downloaded separately from here. If you want to test the pipeline, you can use this dataset that is already pre-clustered using CAP3. |