Date of Award

2004

Level of Access Assigned by Author

Open-Access Thesis

Degree Name

Master of Electrical Engineering (MEE)

Department

Electrical Engineering

Advisor

Mohamad T. Musavi

Second Committee Member

Habtom Ressom

Third Committee Member

Bruce Segee

Abstract

Base calling is the central part of any large-scale genomic sequencing effort. Current sequencing technology produces error rates less than 3.5%. This corresponds to at least 35 errors in a 1000 base read. As the base calling algorithm's error rates drop, the smaller base call errors could be difficult to locate. Hence, assembling algorithms and human operators use a confidence value measure to determine how well the base calling algorithm has performed for each base call. This will clearly make it easier to uncover potential errors and correct them, thus increasing the throughput of genetic sequencing. The model developed here employs fuzzy logic, providing flexibility, adaptability and intuition through the use of linguistic variables and fuzzy membership functions. The proposed approach uses a fuzzy logic system to provide the confidence values of bases called. Three variables that are calculated during the base calling procedure are involved in the fuzzy system. These variables can be calculated at any spatial location and are: peakness, height, and base spacing. In addition to the first most likely candidate (the base called), the peakness and height are also found for the second likely candidate. The technique has been tested on over 3000 ABI 3700 DNA files and the result has shown improved performance over the existing Phred's and ABI's quality value.

Share