Next generation sequencing technologies enable rapid, large-scale production of sequence data sets. Unfortunately these technologies also have a non-neglible sequencing error rate, which biases their outputs by introducing false reads and reducing the quantity of the real reads.
RECOUNT is an Expectation Maximization error correction tool for next generation sequencing data (Solexa/Illumina). The main features of RECOUNT:
We have applied RECOUNT to several types of Solexa/Illumina reads from mouse embryo, 5'-end SAGE, and bacterial metagenomic reads. We found that the correction by our tool not only increases the number of mappable reads, but also make a real difference in the biological interpretation of next generation sequencing data.
- Uses quality score to estimate the correct counts, hence potentially more accurate.
- It does not uses reference genome.
- Memory efficient.
Publication
E. Wijaya, M. C. Frith, Y. Suzuki and P. Horton, RECOUNT: Expectation maximization based error correction tool for next generation sequencing data, in Genome Informatics,(23) 189-201 , 2009. [PDF]