Optimization of Rabin Karp Pattern Matching Algorithm Based on Parallel Computing Techniques for DNA Sequence Analysis

dc.contributor.authorAnjalee, M.G.M.
dc.contributor.authorFernando, W.P.U.
dc.contributor.authorThambawita, D.R.V.L.B.
dc.date.accessioned2019-04-06T06:33:07Z
dc.date.available2019-04-06T06:33:07Z
dc.date.issued2019-02
dc.description.abstractString matching algorithms are used to discover the occurrences of a defined pattern in a given text or a pool of strings which is widely used in detecting plagiarism, spam filtering and most importantly in computational biology including DNA sequencing. The existence and the intensity of a muted sequence in DNA caused for various diseases can be identified using Rabin Karp string matching algorithm. The main contribution of the study is to bring an efficient version of Rabin Karp algorithm by minimizing the spurious hits while using both Central Processing Unit (CPU) parallel techniques and General Purpose Graphics Processing Unit (GPGPU) parallel techniques specifically for DNA sequence analysis. The improved Rabin Karp is implemented using C language with POSIX Threads library, OpenMP and MPI and using Compute Unified Device Architecture (CUDA). When accelerating computations based on GPU, a special consideration has given to global memory, shared memory and texture memory, the types of memories with particular importance offered in CUDA architecture. By experimental studies, we investigated a new method to eliminate brute force matching and the GPU optimization is presented with stencil method ensuring efficiency in terms of memory overhead due to redundant data access in the serial CPU implementation. We have compared these parallel implementations for evaluating the effect of varying number of threads per block as well as varying DNA file sizes. The results obtained in this study present that the proposed implementation provides acceleration surpassing 36x speedup for string size 220 characters compared to a sequential (CPU) implementation. Eventually, using the empirical results, we could conclude that the improved CUDA C implementation of shared memory version can achieve 35 times of performance than serial implementation for a large pool of DNA data in string matching.en_US
dc.identifier.isbn9789550481255
dc.identifier.urihttp://www.erepo.lib.uwu.ac.lk/bitstream/handle/123456789/109/70.pdf?sequence=1&isAllowed=y
dc.language.isoenen_US
dc.publisherUva Wellassa University of Sri Lankaen_US
dc.subjectComputer Scienceen_US
dc.subjectInformation Scienceen_US
dc.subjectComputing and Information Scienceen_US
dc.titleOptimization of Rabin Karp Pattern Matching Algorithm Based on Parallel Computing Techniques for DNA Sequence Analysisen_US
dc.title.alternativeInternational Research Conference 2019en_US
dc.typeOtheren_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
70.pdf
Size:
8.05 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: