Optimization of Rabin Karp Pattern Matching Algorithm Based on Parallel Computing Techniques for DNA Sequence Analysis

Anjalee, M.G.M.; Fernando, W.P.U.; Thambawita, D.R.V.L.B.

Optimization of Rabin Karp Pattern Matching Algorithm Based on Parallel Computing Techniques for DNA Sequence Analysis

dc.contributor.author	Anjalee, M.G.M.
dc.contributor.author	Fernando, W.P.U.
dc.contributor.author	Thambawita, D.R.V.L.B.
dc.date.accessioned	2019-04-06T06:33:07Z
dc.date.available	2019-04-06T06:33:07Z
dc.date.issued	2019-02
dc.description.abstract	String matching algorithms are used to discover the occurrences of a defined pattern in a given text or a pool of strings which is widely used in detecting plagiarism, spam filtering and most importantly in computational biology including DNA sequencing. The existence and the intensity of a muted sequence in DNA caused for various diseases can be identified using Rabin Karp string matching algorithm. The main contribution of the study is to bring an efficient version of Rabin Karp algorithm by minimizing the spurious hits while using both Central Processing Unit (CPU) parallel techniques and General Purpose Graphics Processing Unit (GPGPU) parallel techniques specifically for DNA sequence analysis. The improved Rabin Karp is implemented using C language with POSIX Threads library, OpenMP and MPI and using Compute Unified Device Architecture (CUDA). When accelerating computations based on GPU, a special consideration has given to global memory, shared memory and texture memory, the types of memories with particular importance offered in CUDA architecture. By experimental studies, we investigated a new method to eliminate brute force matching and the GPU optimization is presented with stencil method ensuring efficiency in terms of memory overhead due to redundant data access in the serial CPU implementation. We have compared these parallel implementations for evaluating the effect of varying number of threads per block as well as varying DNA file sizes. The results obtained in this study present that the proposed implementation provides acceleration surpassing 36x speedup for string size 220 characters compared to a sequential (CPU) implementation. Eventually, using the empirical results, we could conclude that the improved CUDA C implementation of shared memory version can achieve 35 times of performance than serial implementation for a large pool of DNA data in string matching.	en_US
dc.identifier.isbn	9789550481255
dc.identifier.uri	http://www.erepo.lib.uwu.ac.lk/bitstream/handle/123456789/109/70.pdf?sequence=1&isAllowed=y
dc.language.iso	en	en_US
dc.publisher	Uva Wellassa University of Sri Lanka	en_US
dc.subject	Computer Science	en_US
dc.subject	Information Science	en_US
dc.subject	Computing and Information Science	en_US
dc.title	Optimization of Rabin Karp Pattern Matching Algorithm Based on Parallel Computing Techniques for DNA Sequence Analysis	en_US
dc.title.alternative	International Research Conference 2019	en_US
dc.type	Other	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 70.pdf
Size:: 8.05 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

International Research Conference of UWU-2019