{"defaultlang":"zh","titlegroup":{"articletitle":[{"lang":"zh","data":[{"name":"text","data":"基于GPU+CPU的CANNY算子快速实现"}]},{"lang":"en","data":[{"name":"text","data":"Fast Canny algorithm based on GPU+CPU"}]}]},"contribgroup":{"author":[{"name":[{"lang":"zh","surname":"唐","givenname":"斌","namestyle":"eastern","prefix":""},{"lang":"en","surname":"TANG","givenname":"Bin","namestyle":"western","prefix":""}],"stringName":[],"aff":[{"rid":"aff1","text":"1"}],"role":["corresp","first-author"],"corresp":[{"rid":"cor1","lang":"zh","text":"唐斌(1981-),男,湖南邵阳人,硕士,讲师,主要从事电子系统设计自动化方面、图像处理研究。E-mail:tangbin54@163.com","data":[{"name":"text","data":"唐斌(1981-),男,湖南邵阳人,硕士,讲师,主要从事电子系统设计自动化方面、图像处理研究。E-mail:tangbin54@163.com"}]}],"email":"tangbin54@163.com","deceased":false},{"name":[{"lang":"zh","surname":"龙","givenname":"文","namestyle":"eastern","prefix":""},{"lang":"en","surname":"LONG","givenname":"Wen","namestyle":"western","prefix":""}],"stringName":[],"aff":[{"rid":"aff2","text":"2"}],"role":[],"bio":[{"lang":"zh","text":["龙文(1977-),男,湖南邵阳人,博士,教授,主要从事进化计算研究。E-mail:382635426@qq.com"],"graphic":[],"data":[[{"name":"text","data":"龙文(1977-),男,湖南邵阳人,博士,教授,主要从事进化计算研究。E-mail:"},{"name":"text","data":"382635426@qq.com"}]]}],"email":"382635426@qq.com","deceased":false}],"aff":[{"id":"aff1","intro":[{"lang":"zh","label":"1","text":"贵州财经大学 信息学院, 贵州 贵阳 550025","data":[{"name":"text","data":"贵州财经大学 信息学院, 贵州 贵阳 550025"}]},{"lang":"en","label":"1","text":"School of Information, Guizhou University of Finance and Economics, Guiyang 550025, China","data":[{"name":"text","data":"School of Information, Guizhou University of Finance and Economics, Guiyang 550025, China"}]}]},{"id":"aff2","intro":[{"lang":"zh","label":"2","text":"贵州财经大学 贵州省经济系统仿真重点实验室, 贵州 贵阳 550025","data":[{"name":"text","data":"贵州财经大学 贵州省经济系统仿真重点实验室, 贵州 贵阳 550025"}]},{"lang":"en","label":"2","text":"Guizhou Key Laboratory of Economics System Simulation, Guizhou University of Finance and Economics, Guiyang 550025, China","data":[{"name":"text","data":"Guizhou Key Laboratory of Economics System Simulation, Guizhou University of Finance and Economics, Guiyang 550025, China"}]}]}]},"abstracts":[{"lang":"zh","data":[{"name":"p","data":[{"name":"text","data":"本文提出一种基于GPU+CPU的快速实现Canny算子的方法。首先将算子分为串行和并行两部分,高斯滤波、梯度幅值和方向计算、非极大值抑制和双阈值处理在GPU中完成,将二维高斯滤波分解为水平方向上和垂直方向上的两次一维滤波从而降低计算的复杂度;然后使用CUDA编程完成多线程并行计算以加快计算速度;最后使用共享存储器隐藏线程访问全局存储的延迟;在CPU中则使用队列FIFO完成边缘连接。仿真测试结果表明:对分辨率为1024×1024的8位图像的处理时间为122 ms,相对应单独使用CPU而言,加速比最高可达5.39倍,因此本文方法充分利用了GPU的并行性的特征和CPU的串行处理能力。"}]}]},{"lang":"en","data":[{"name":"p","data":[{"name":"text","data":"This paper presents a fast method for Canny algorithm based on GPU+CPU. The Canny algorithm is divided into two parts:Gauss filtering, gradient computations, non maximum suppression and double thresholding are processed by GPU. The fast method convert two-dimensional Gaussian filter to two separable convolutions to reduce the computation complexity. Then, multiple threads execute kernel in parallel to speed up the computation in the CUDA program. Finally, threads access shared memory instead of global memory to hide the latencies of global memory. In addition, FIFO is used to connect components in CPU. The simulation results show that the processing time of the 8-bit images with the resolution 1 024×1 024 is 122 ms, which is 5.39 times faster than CPU. Therefore, this method takes full advantage of the parallelism of GPU and the serial processing capability of CPU."}]}]}],"keyword":[{"lang":"zh","data":[[{"name":"text","data":"CANNY"}],[{"name":"text","data":"CUDA"}],[{"name":"text","data":"GPU"}],[{"name":"text","data":"加速"}]]},{"lang":"en","data":[[{"name":"text","data":"Canny"}],[{"name":"text","data":"CUDA"}],[{"name":"text","data":"GPU"}],[{"name":"text","data":"acceleration"}]]}],"highlights":[],"body":[{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"1"}],"title":[{"name":"text","data":"引言"}],"level":"1","id":"s1"}},{"name":"p","data":[{"name":"text","data":"John F. Canny在1986年提出最佳多级边缘检测Canny算子"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"1","type":"bibr","rid":"b1","data":[{"name":"text","data":"1"}]}},{"name":"text","data":"]"}]},{"name":"text","data":",由于其检测效果较好,因此在边缘检测处理中应用广泛。CANNY算子需要高斯滤波、梯度幅值和方向计算、非极大值抑制和双阈值处理,如果直接使用CPU检测分辨率高的图像边缘时,计算工作量大、消耗时间较长。考虑到图像中各个像素点的数据结构规则,运算程序相同,适合并行运算。GPU(Graphics Processing Unit)可以提供多个计算资源并行运算以加快程序的计算速度"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"2","type":"bibr","rid":"b2","data":[{"name":"text","data":"2"}]}},{"name":"text","data":"]"}]},{"name":"text","data":",使用GPU加速处理的技术广泛应用在分子动力学模拟、医疗成像、空间建模和图像处理等密集计算领域,因此可以利用GPU进行加速处理CANNY算子。钮圣虓等在其文中使用GPU加速canny算子的计算"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"3","type":"bibr","rid":"b3","data":[{"name":"text","data":"3"}]}},{"name":"text","data":"]"}]},{"name":"text","data":";Luo Y C等则在英伟达的GPU上实现了CANNY 算子的计算,相对使用OPENCV而言,最大加速比可达3.403"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"4","type":"bibr","rid":"b4","data":[{"name":"text","data":"4"}]}},{"name":"text","data":"]"}]},{"name":"text","data":";文献"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"3","type":"bibr","rid":"b3","data":[{"name":"text","data":"3"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"和"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"4","type":"bibr","rid":"b4","data":[{"name":"text","data":"4"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"均采用的是在GPU上完成CANNY算子整个处理过程,但并没有优化充分利用高斯函数可分离性的性质降低高斯滤波的计算量,而且在GPU中实现真边缘的连接,需要向CPU返回真边缘以完成线程的同步,传输时间大于连接时间。因此本文提出一种基于GPU和CPU的快速实现Canny算子的方法,利用CUDA编程在GPU上完成数值计算部分,在CPU上完成边缘的连接。"}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"2"}],"title":[{"name":"text","data":"CANNY算子"}],"level":"1","id":"s2"}},{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"2.1"}],"title":[{"name":"text","data":"CANNY算子的准则"}],"level":"2","id":"s2-1"}},{"name":"p","data":[{"name":"text","data":"Canny算子满足最优检测器的3条准则"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"5","type":"bibr","rid":"b5","data":[{"name":"text","data":"5"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"。"}]},{"name":"p","data":[{"name":"text","data":"(1) 信噪比检测准则:要求尽量测出图像中所有的真实边缘点和抑制所有的伪边缘点,提供一个最大的信噪比。"}]},{"name":"p","data":[{"name":"text","data":"(2) 精确定位准则:要求检测到的边缘点位置与真实边缘点位置之间的距离差值最小。"}]},{"name":"p","data":[{"name":"text","data":"(3) 单边缘响应准则:要求降低单边缘多响应的概率,检测到的边缘点与真实边缘点具备一对一特征从而抑制伪边缘的多响应。"}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"2.2"}],"title":[{"name":"text","data":"CANNY算子的流程"}],"level":"2","id":"s2-2"}},{"name":"p","data":[{"name":"text","data":"使用CANNY算子检测图像边缘的流程分为以下4步:"}]},{"name":"p","data":[{"name":"text","data":"(1) 高斯滤波:待处理的图像可能在图像的获取和传输的过程中受到噪声污染,在边缘检测前需要使用高斯滤波器对图像滤波,高斯滤波的实质是线性低通滤波,主要用来抑制图像中的高频信息(噪声等)和保留图像的低频信息从而平滑图像。"}]},{"name":"p","data":[{"name":"text","data":"(2) 梯度幅值和方向计算:根据像素点的梯度幅值和方向检测图像的边缘。CANNY算子利用一阶有限差分在2×2邻域中计算梯度和方向,像素点"},{"name":"italic","data":[{"name":"text","data":"P(i,j)"}]},{"name":"text","data":"的梯度幅值和方向计算过程如下所示:"}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"italic","data":[{"name":"text","data":"X"}]},{"name":"text","data":"方向偏导数:"}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"dispformula","data":{"label":[{"name":"text","data":"1"}],"data":[{"name":"text","data":" "},{"name":"text","data":" "},{"name":"math","data":{"graphicsData":{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593383&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593383&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593383&type=middle"}}}],"id":"yjyxs-31-7-714-E1"}},{"name":"text","data":" "}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"italic","data":[{"name":"text","data":"Y"}]},{"name":"text","data":"方向偏导数:"}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"dispformula","data":{"label":[{"name":"text","data":"2"}],"data":[{"name":"text","data":" "},{"name":"text","data":" "},{"name":"math","data":{"graphicsData":{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593387&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593387&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593387&type=middle"}}}],"id":"yjyxs-31-7-714-E2"}},{"name":"text","data":" "}]},{"name":"p","data":[{"name":"text","data":"梯度幅值:"}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"dispformula","data":{"label":[{"name":"text","data":"3"}],"data":[{"name":"text","data":" "},{"name":"text","data":" "},{"name":"math","data":{"graphicsData":{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593391&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593391&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593391&type=middle"}}}],"id":"yjyxs-31-7-714-E3"}},{"name":"text","data":" "}]},{"name":"p","data":[{"name":"text","data":"方向:"}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"dispformula","data":{"label":[{"name":"text","data":"4"}],"data":[{"name":"text","data":" "},{"name":"text","data":" "},{"name":"math","data":{"graphicsData":{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593394&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593394&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593394&type=middle"}}}],"id":"yjyxs-31-7-714-E4"}},{"name":"text","data":" "}]},{"name":"p","data":[{"name":"text","data":"(3) 非最大值抑制:在第二步中获取全局梯度后还须要进一步筛选局部真实边缘点。筛选方法是将梯度方向分为0°、45°、90°、135°等4个方向。像素点"},{"name":"italic","data":[{"name":"text","data":"P(i,j)"}]},{"name":"text","data":"的梯度幅值与其梯度方向上两个像素点的梯度幅值做比较,如果其值最大,则保留像素点"},{"name":"italic","data":[{"name":"text","data":"P(i,j)"}]},{"name":"text","data":"的梯度值,否则将其梯度幅值设置为零。"}]},{"name":"p","data":[{"name":"text","data":"(4) 双阈值处理和边缘连接:经过非极大值处理后的图像还需要进行双阈值处理确定真边缘和消除假边缘,使用高阈值"},{"name":"italic","data":[{"name":"text","data":"T"},{"name":"sub","data":[{"name":"text","data":"H"}]}]},{"name":"text","data":"筛选出真边缘图像"},{"name":"italic","data":[{"name":"text","data":"S(i,j)"}]},{"name":"text","data":";使用低阈值"},{"name":"italic","data":[{"name":"text","data":"T"},{"name":"sub","data":[{"name":"text","data":"L"}]}]},{"name":"text","data":"删除图像中的假边缘像素点;而介于高阈值和低阈值的像素点则构建成弱边缘图像"},{"name":"italic","data":[{"name":"text","data":"W(i,j)"}]},{"name":"text","data":"。若图像"},{"name":"italic","data":[{"name":"text","data":"W(i,j)"}]},{"name":"text","data":"中的某个弱边缘像素其8邻域中存在任意一个真边缘像素,将该弱边缘像素补充到真边缘图像"},{"name":"italic","data":[{"name":"text","data":"S(i,j)"}]},{"name":"text","data":"中完成边缘的连接。为检测"},{"name":"italic","data":[{"name":"text","data":"W(i,j)"}]},{"name":"text","data":"的像素是否可以补充到真边缘图像"},{"name":"italic","data":[{"name":"text","data":"S(i,j)"}]},{"name":"text","data":"中,需要使用递归函数遍历弱边缘图像"},{"name":"italic","data":[{"name":"text","data":"S(i,j)"}]},{"name":"text","data":"中每个像素。"}]}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"3"}],"title":[{"name":"text","data":"CANNY算法在GPU中的加速"}],"level":"1","id":"s3"}},{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"3.1"}],"title":[{"name":"text","data":"优化高斯滤波算法"}],"level":"2","id":"s3-1"}},{"name":"p","data":[{"name":"text","data":"由于处理的图像为二维函数,平滑图像时采用的是二维高斯函数"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"6","type":"bibr","rid":"b6","data":[{"name":"text","data":"6"}]}},{"name":"text","data":"]"}]},{"name":"text","data":",其表达式如下所示:"}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"dispformula","data":{"label":[{"name":"text","data":"5"}],"data":[{"name":"text","data":" "},{"name":"text","data":" "},{"name":"math","data":{"graphicsData":{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593397&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593397&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593397&type=middle"}}}],"id":"yjyxs-31-7-714-E5"}},{"name":"text","data":" "}]},{"name":"p","data":[{"name":"text","data":"式中:"},{"name":"italic","data":[{"name":"text","data":"x"}]},{"name":"text","data":"和"},{"name":"italic","data":[{"name":"text","data":"y"}]},{"name":"text","data":"分别表示的是邻域中某像素点与中心像素点在水平方向和垂直方向的距离。高斯滤波算法就是利用高斯核函数与原始图像做卷积运算"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"7","type":"bibr","rid":"b7","data":[{"name":"text","data":"7"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"。高斯滤波后中心像素点的值为中心像素点和其邻域像素点的加权平均值。高斯滤波具体过程如下所示:"}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"dispformula","data":{"label":[{"name":"text","data":"6"}],"data":[{"name":"text","data":" "},{"name":"text","data":" "},{"name":"math","data":{"graphicsData":{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593400&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593400&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593400&type=middle"}}}],"id":"yjyxs-31-7-714-E6"}},{"name":"text","data":" "}]},{"name":"p","data":[{"name":"text","data":"式(6)中"},{"name":"italic","data":[{"name":"text","data":"I(i,j)"}]},{"name":"text","data":"为滤波后的中心像素点值,"},{"name":"italic","data":[{"name":"text","data":"p(i,j)"}]},{"name":"text","data":"为中心像素点值,"},{"name":"italic","data":[{"name":"text","data":"i"}]},{"name":"text","data":"和"},{"name":"italic","data":[{"name":"text","data":"j"}]},{"name":"text","data":"为中心像素点的坐标,由于绝大多数的图像处理都采用滑动窗口形式计算,通常可以先将各个距离的权系数计算出来作为模板使用,使用模板与原始图像做卷积完成高斯滤波。系数"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"和"},{"name":"italic","data":[{"name":"text","data":"b"}]},{"name":"text","data":"与高斯模板大小相关,2a+1和2b+1表示的都是模板的大小。"}]},{"name":"p","data":[{"name":"text","data":"由于高斯滤波的加权系数具有关于旋转对称的特征,任意方向上的权系数关于中心点对称。根据高斯函数的二维空间表达式,二维空间的高斯滤波可以分解成水平方向的和垂直方向的两次独立的一维高斯滤波"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"8","type":"bibr","rid":"b8","data":[{"name":"text","data":"8"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"。其具体计算过程如下所示:"}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"dispformula","data":{"label":[{"name":"text","data":"7"}],"data":[{"name":"text","data":" "},{"name":"text","data":" "},{"name":"math","data":{"graphicsData":{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593403&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593403&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593403&type=middle"}}}],"id":"yjyxs-31-7-714-E7"}},{"name":"text","data":" "}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"dispformula","data":{"label":[{"name":"text","data":"8"}],"data":[{"name":"text","data":" "},{"name":"text","data":" "},{"name":"math","data":{"graphicsData":{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593406&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593406&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593406&type=middle"}}}],"id":"yjyxs-31-7-714-E8"}},{"name":"text","data":" "}]},{"name":"p","data":[{"name":"text","data":"首先根据公式(7)计算出所有像素点在水平方向上的高斯滤波,然后再使用公式(8)对水平放上计算的结果做垂直上的高斯滤波。二维滤波分解为两次一维滤波后,除边界点外,每个像素点水平方向一维高斯滤波值temp("},{"name":"italic","data":[{"name":"text","data":"i,j"}]},{"name":"text","data":")被重复利用2"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"+1次。原有计算复杂度"},{"name":"italic","data":[{"name":"text","data":"O"}]},{"name":"text","data":"((2"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"+1)×(2"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"+1)×"},{"name":"italic","data":[{"name":"text","data":"M×N)(M×N"}]},{"name":"text","data":"为图像分辨率)降低到"},{"name":"italic","data":[{"name":"text","data":"O"}]},{"name":"text","data":"((2"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"+1)×"},{"name":"italic","data":[{"name":"text","data":"M×N"}]},{"name":"text","data":")+"},{"name":"italic","data":[{"name":"text","data":"O"}]},{"name":"text","data":"((2"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"+1)×"},{"name":"italic","data":[{"name":"text","data":"M×N"}]},{"name":"text","data":"),计算复杂度下降。"}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"3.2"}],"title":[{"name":"text","data":"使用多个线程加快计算速度"}],"level":"2","id":"s3-2"}},{"name":"p","data":[{"name":"text","data":"NVIDIA公司的GPU 属于SIMD体系结构,GPU具有全局存储器、共享存储器和多个SM(Streaming Multiprocessor),每个SM包含8个SP(Stream Processor),SP的实质是一个全流水线化的单发射微处理器,其包含两个逻辑运算部件和一个浮点数运算部件,每个SP执行一个线程(Thread)。多个线程组成一个线程块(Block),等价于每个SM执行一个线程块,因此GPU可以同时执行多个线程。CUDA(Compute Unified Device Architecture,计算统一架构)是NVIDIA公司在2007年提出的一种新开发环境,在CUDA中可以使用C语言完成多线程程序的编程。其异构编程模型如"},{"name":"xref","data":{"text":"图 1","type":"fig","rid":"Figure1","data":[{"name":"text","data":"图 1"}]}},{"name":"text","data":"所示:一个完整的程序分为串行和并行两部分,串行部分通常在主机(Host)上运行,可并行部分则在设备(Device)中运行以充分利用GPU的并行性实现计算加速"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"9","type":"bibr","rid":"b9","data":[{"name":"text","data":"9"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"。"}]},{"name":"fig","data":{"id":"Figure1","caption":[{"lang":"zh","label":[{"name":"text","data":"图1"}],"title":[{"name":"text","data":"CUDA编程模型"}]},{"lang":"en","label":[{"name":"text","data":"Fig 1"}],"title":[{"name":"text","data":"CUDA programming model"}]}],"subcaption":[],"note":[],"graphics":[{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593409&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593409&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593409&type=middle"}]}},{"name":"p","data":[{"name":"text","data":"本文在GPU中加速CANNY算子的高斯滤波、梯度幅值和方向计算、非极大值抑制和双阈值处理,处理的流程如"},{"name":"xref","data":{"text":"图 2","type":"fig","rid":"Figure2","data":[{"name":"text","data":"图 2"}]}},{"name":"text","data":"所示:首先使用cudaMemcpy()函数将待处理图像像素矩阵值从主机(Host)传输到GPU的全局存储器,任务1、任务2、任务3分别是高斯滤波、梯度幅值和方向计算、非极大值抑制和双阈值处理。三个任务依序在GPU中并行执行,由于上个任务执行完后才能开启下个任务的执行,需要使用CUDA提供的syncthreads()函数实现任务之间的栅栏同步"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"2","type":"bibr","rid":"b2","data":[{"name":"text","data":"2"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"。最后通过cudaMemcpy()函数将GPU处理结果从全局存储器返回主机。在GPU中所有任务使用相同的"},{"name":"italic","data":[{"name":"text","data":"n"}]},{"name":"sup","data":[{"name":"text","data":"2"}]},{"name":"text","data":"个线程块并行完成,每个线程块中包含"},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"sup","data":[{"name":"text","data":"2"}]},{"name":"text","data":"个线程,通常情况下所有线程的程序相同但处理像素点不同,线程与像素点属于一对一关系。由于图像可以采用二维矩阵表示,因此每个像素点对应的线程ID号也采用二维坐标(Row,Col)表示,坐标值由以下公式获取:"}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"dispformula","data":{"label":[{"name":"text","data":"9"}],"data":[{"name":"text","data":" "},{"name":"text","data":" "},{"name":"math","data":{"graphicsData":{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593411&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593411&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593411&type=middle"}}}],"id":"yjyxs-31-7-714-E9"}},{"name":"text","data":" "}]},{"name":"p","data":[{"name":"text","data":" "},{"name":"dispformula","data":{"label":[{"name":"text","data":"10"}],"data":[{"name":"text","data":" "},{"name":"text","data":" "},{"name":"math","data":{"graphicsData":{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593414&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593414&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593414&type=middle"}}}],"id":"yjyxs-31-7-714-E10"}},{"name":"text","data":" "}]},{"name":"p","data":[{"name":"text","data":"上述表达式中 blockIdx.x和blockIdx.y表示的是线程块的坐标,它们的乘积等于"},{"name":"italic","data":[{"name":"text","data":"n"}]},{"name":"sup","data":[{"name":"text","data":"2"}]},{"name":"text","data":"。每个块中的线程可以用坐标threadIdx.x和threadIdx.y表示,它们的乘积等于"},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"sup","data":[{"name":"text","data":"2"}]},{"name":"text","data":"。"}]},{"name":"fig","data":{"id":"Figure2","caption":[{"lang":"zh","label":[{"name":"text","data":"图2"}],"title":[{"name":"text","data":"CANNY算子加速流程"}]},{"lang":"en","label":[{"name":"text","data":"Fig 2"}],"title":[{"name":"text","data":"Acceleration flow of Canny algorithm"}]}],"subcaption":[],"note":[],"graphics":[{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593417&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593417&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593417&type=middle"}]}}]},{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"3.3"}],"title":[{"name":"text","data":"使用共享存储器(shared memory)隐藏全局存储器(global memory)访存延迟"}],"level":"2","id":"s3-3"}},{"name":"p","data":[{"name":"text","data":"由于SP中没有使用缓冲技术,各个线程需要从全局存储器中读写像素值,但全局存储器的访存时间较长,此时会导致部分线程处于等待操作状态,因此降低了线程的利用效率。从CANNY算子处理流程可知:相邻像素点在做计算时,所需要的数据存在交集;同时上个任务执行完后的结果需要提供给下个任务使用。因此可以将像素值存储在访存时间较短的共享存储器中,减少线程对全局存储器的访问次数,从而隐藏存储器访存延迟。共享存储器的作用类似于CPU中的CACHE。"}]},{"name":"p","data":[{"name":"text","data":"基本块中线程的数量为"},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"sup","data":[{"name":"text","data":"2"}]},{"name":"text","data":",共享存储器可存储"},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"sup","data":[{"name":"text","data":"2"}]},{"name":"text","data":"个像素值。需要注意的是共享存储器只供基本块中的线程访问,不支持跨块访问。在3个任务处理过程中,基本块中边界点的计算均需要跨块访问,为防止跨块访问,本文将基本块扩展成扩展块。如果非极大值抑制的基本块大小为"},{"name":"italic","data":[{"name":"text","data":"m×m"}]},{"name":"text","data":",那么其扩展块大小为("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+2)×("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+2);梯度幅值和方向计算的基本块大小则为("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+2)×("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+2),其扩展块大小为("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+4)×("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+4);在高斯滤波时,由于模板大小为"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"×"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":",其基本块大小为其("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+4)×("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+4),其扩展块大小应为("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+2"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"+4)×("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+2"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"+4)。基本块与扩展块如下图所3:基本块设置为"},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"×"},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":",扩展块大小设置为("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+2"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"+4)×("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+2"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"+4)。扩展块中基本块中的线程主要完成高斯滤波、梯度幅值和方向计算、非极大值抑制和双阈值处理,剩余部分的线程负责为边界像素点提供数据。采用共享存储器作为扩展块后,访问全局存储器的次数减少为("},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"text","data":"+2"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"+4)"},{"name":"sup","data":[{"name":"text","data":"2"}]},{"name":"text","data":"×"},{"name":"italic","data":[{"name":"text","data":"M"}]},{"name":"text","data":"×"},{"name":"italic","data":[{"name":"text","data":"N"}]},{"name":"text","data":"/"},{"name":"italic","data":[{"name":"text","data":"m"}]},{"name":"sup","data":[{"name":"text","data":"2"}]},{"name":"text","data":"+"},{"name":"italic","data":[{"name":"text","data":"M"}]},{"name":"text","data":"×"},{"name":"italic","data":[{"name":"text","data":"N"}]},{"name":"text","data":",而不使用共享存储器时需要访问全局存储器次数为(2"},{"name":"italic","data":[{"name":"text","data":"a"}]},{"name":"text","data":"+1)"},{"name":"sup","data":[{"name":"text","data":"2"}]},{"name":"text","data":"×"},{"name":"italic","data":[{"name":"text","data":"M"}]},{"name":"text","data":"×"},{"name":"italic","data":[{"name":"text","data":"N"}]},{"name":"text","data":"+12×"},{"name":"italic","data":[{"name":"text","data":"M"}]},{"name":"text","data":"×"},{"name":"italic","data":[{"name":"text","data":"N"}]},{"name":"text","data":",因此该方法能显著减少访问全局存储器的次数。"}]},{"name":"fig","data":{"id":"Figure3","caption":[{"lang":"zh","label":[{"name":"text","data":"图3"}],"title":[{"name":"text","data":"共享存储器分配示意图"}]},{"lang":"en","label":[{"name":"text","data":"Fig 3"}],"title":[{"name":"text","data":"Allocate shared memory"}]}],"subcaption":[],"note":[],"graphics":[{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593420&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593420&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593420&type=middle"}]}}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"4"}],"title":[{"name":"text","data":"边缘连接在CPU中的实现"}],"level":"1","id":"s4"}},{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"4.1"}],"title":[{"name":"text","data":"边缘连接"}],"level":"2","id":"s4-1"}},{"name":"p","data":[{"name":"text","data":"如果直接在GPU上完成边缘点的连接,由于GPU不支持块与块之间的通信"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"4","type":"bibr","rid":"b4","data":[{"name":"text","data":"4"}]}},{"name":"text","data":"]"}]},{"name":"text","data":",在多个线程完成某个边缘点连接时,需要向CPU返回处理结果完成线程与线程之间的同步,导致同步消耗的时间大于边缘连接的时间;或是使用原子操作同步,但是由于边缘连接像素位置可能不是列连续,随机写全局存储存在大的延迟,不适合在GPU中做边缘连接处理。因此本系统采用队列FIFO(First In First Output)在CPU中完成强边缘图像"},{"name":"italic","data":[{"name":"text","data":"S(i,j)"}]},{"name":"text","data":"中所有像素边缘的连接"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"10","type":"bibr","rid":"b10","data":[{"name":"text","data":"10"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"。在强边缘图像"},{"name":"italic","data":[{"name":"text","data":"S(i,j)"}]},{"name":"text","data":"中取出第一个像素P压入队列Queue,同时以P点为起点,在弱边缘图像"},{"name":"italic","data":[{"name":"text","data":"W(i,j)"}]},{"name":"text","data":"中搜索P点的8邻域。如果存在弱边缘像素,将它们补充到强边缘图像"},{"name":"italic","data":[{"name":"text","data":"S(i,j)"}]},{"name":"text","data":"中,同时也压入到队列的尾部。然后从队列的首部取出已经归类于强边缘的弱若边缘Q点。再以Q点为起点,继续使在弱边缘图像"},{"name":"italic","data":[{"name":"text","data":"W(i,j)"}]},{"name":"text","data":"中搜索Q点的8邻域像素;如果不周围不存在弱边缘像素,则直接从队列中取出其他像素,重复上述过程直至队列为空。然后将"},{"name":"italic","data":[{"name":"text","data":"S(i,j)"}]},{"name":"text","data":"中的第二个像素压入队列,重复上述过程直至搜索完"},{"name":"italic","data":[{"name":"text","data":"S(i,j)"}]},{"name":"text","data":"的每个像素。"},{"name":"xref","data":{"text":"图 4","type":"fig","rid":"Figure4","data":[{"name":"text","data":"图 4"}]}},{"name":"text","data":"给出了起点"},{"name":"italic","data":[{"name":"text","data":"P(3,2)"}]},{"name":"text","data":"的边缘连接过程。"}]},{"name":"fig","data":{"id":"Figure4","caption":[{"lang":"zh","label":[{"name":"text","data":"图4"}],"title":[{"name":"text","data":"边缘连接示例图"}]},{"lang":"en","label":[{"name":"text","data":"Fig 4"}],"title":[{"name":"text","data":"Example of connecting edges"}]}],"subcaption":[],"note":[],"graphics":[{"print":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593424&type=","small":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593424&type=small","big":"http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=1593424&type=middle"}]}},{"name":"p","data":[{"name":"text","data":"边缘连接伪代码如下所示:"}]},{"name":"p","data":[{"name":"text","data":"For P S(i,j){"}]},{"name":"p","data":[{"name":"text","data":"  将P点坐标存入Queue;"}]},{"name":"p","data":[{"name":"text","data":"  Queue首地址指向P;"}]},{"name":"p","data":[{"name":"text","data":"  While(Queue不为空){"}]},{"name":"p","data":[{"name":"text","data":"   从Queue队首获取强边缘像素点Q坐标X和Y;"}]},{"name":"p","data":[{"name":"text","data":"   Queue首地址指向下一个强边缘像素;"}]},{"name":"p","data":[{"name":"text","data":" For Q (Q点的8邻域像素) //在弱边缘W(i,j)中检测Q的8邻域像素;"}]},{"name":"p","data":[{"name":"text","data":"If Q"},{"name":"text","data":">"},{"name":"text","data":"T"},{"name":"sub","data":[{"name":"text","data":"L"}]},{"name":"text","data":"{"}]},{"name":"p","data":[{"name":"text","data":"  Q <=0;"}]},{"name":"p","data":[{"name":"text","data":"  Q点存入强边缘图像S(i,j);"}]},{"name":"p","data":[{"name":"text","data":"  将Q点坐标存入Queue;"}]},{"name":"p","data":[{"name":"text","data":"  Queue尾地址指向Q点;"}]},{"name":"p","data":[{"name":"text","data":"  }"}]},{"name":"p","data":[{"name":"text","data":" }"}]},{"name":"p","data":[{"name":"text","data":" }"}]}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"5"}],"title":[{"name":"text","data":"实验结果及分析"}],"level":"1","id":"s5"}},{"name":"p","data":[{"name":"text","data":"本文最后对上述所提出的方法进行实验测试,测试图片选用Lena,图片格式是BMP,BMP格式较为简单,其文件由档头、信息头、调色板和位图数据组成。位图数据没有压缩原始数据。存储的是原始RGB像素值,使用C语言程序直接读出各个像素点的RGB值,然后转换为8位的灰度图像。本文所使用的实验环境如"},{"name":"xref","data":{"text":"表 1","type":"table","rid":"Table1","data":[{"name":"text","data":"表 1"}]}},{"name":"text","data":"所示。"}]},{"name":"table","data":{"id":"Table1","caption":[{"lang":"zh","label":[{"name":"text","data":"表1"}],"title":[{"name":"text","data":"实验环境"}]},{"lang":"en","label":[{"name":"text","data":"Table 1"}],"title":[{"name":"text","data":"Experimental environment"}]}],"note":[],"table":[{"head":[[{"data":[{"name":"text","data":"实验设备"}]},{"data":[{"name":"text","data":"主频"}]},{"data":[{"name":"text","data":"开发环境"}]},{"data":[{"name":"text","data":"语言"}]}]],"body":[[{"data":[{"name":"text","data":"Intel core i5(4核)"}]},{"data":[{"name":"text","data":"1.7 GHz"}]},{"data":[{"name":"text","data":"Visualstudio 2012"}]},{"data":[{"name":"text","data":"C语言"}]}],[{"data":[{"name":"text","data":"NVIDIA GEFORCE 820M"}]},{"data":[{"name":"text","data":"1.25 GHz"}]},{"data":[{"name":"text","data":"CUDA 6.5Visualstudio 2012"}]},{"data":[{"name":"text","data":"C语言"}]}]],"foot":[]}]}},{"name":"p","data":[{"name":"text","data":"NVIDIA GEFORCE 820M中存在2个SM,全局存储器容量为1 GB,每个线程块的共享存储器容量为48 KB。由于需要计算梯度的幅值和方向,需要在每个线程块中开辟3个扩展块,一个用于保存梯度方向,其余两个用于存储上个任务的处理结果和本次任务处理结果。因此每个扩展块的容量不能超过16 KB。本文基本块的大小选择为16×16,包含16×16个线程。线程块数量为"},{"name":"italic","data":[{"name":"text","data":"M×N"}]},{"name":"text","data":"/256。最后测试的数据如"},{"name":"xref","data":{"text":"表 2","type":"table","rid":"Table2","data":[{"name":"text","data":"表 2"}]}},{"name":"text","data":"所示。"}]},{"name":"table","data":{"id":"Table2","caption":[{"lang":"zh","label":[{"name":"text","data":"表2"}],"title":[{"name":"text","data":"实验结果"}]},{"lang":"en","label":[{"name":"text","data":"Table 2"}],"title":[{"name":"text","data":"Result of system simulation"}]}],"note":[],"table":[{"head":[[{"data":[{"name":"text","data":"图像"},{"name":"text","data":""},{"name":"text","data":"分辨率"}]},{"data":[{"name":"text","data":"高斯模板"},{"name":"text","data":""},{"name":"text","data":"大小"}]},{"data":[{"name":"text","data":"完全CPU"},{"name":"text","data":""},{"name":"text","data":"运算时间/ms"}]},{"data":[{"name":"text","data":"GPU+GPU"},{"name":"text","data":""},{"name":"text","data":"混合运算时间/ms"}]},{"data":[{"name":"text","data":"加速比"}]}]],"body":[[{"data":[{"name":"text","data":"256×256"}]},{"data":[{"name":"text","data":"3×3"}]},{"data":[{"name":"text","data":"10"}]},{"data":[{"name":"text","data":"5"}]},{"data":[{"name":"text","data":"2.00"}]}],[{"data":[{"name":"text","data":"512×512"}]},{"data":[{"name":"text","data":"3×3"}]},{"data":[{"name":"text","data":"41"}]},{"data":[{"name":"text","data":"22"}]},{"data":[{"name":"text","data":"1.86"}]}],[{"data":[{"name":"text","data":"1 024×1 024"}]},{"data":[{"name":"text","data":"3×3"}]},{"data":[{"name":"text","data":"149"}]},{"data":[{"name":"text","data":"82"}]},{"data":[{"name":"text","data":"1.81"}]}],[{"data":[{"name":"text","data":"512×512"}]},{"data":[{"name":"text","data":"9×9"}]},{"data":[{"name":"text","data":"133"}]},{"data":[{"name":"text","data":"29"}]},{"data":[{"name":"text","data":"4.56"}]}],[{"data":[{"name":"text","data":"1 024×1 024"}]},{"data":[{"name":"text","data":"9×9"}]},{"data":[{"name":"text","data":"480"}]},{"data":[{"name":"text","data":"113"}]},{"data":[{"name":"text","data":"4.24"}]}],[{"data":[{"name":"text","data":"1 024×1 024"}]},{"data":[{"name":"text","data":"11×11"}]},{"data":[{"name":"text","data":"658"}]},{"data":[{"name":"text","data":"122"}]},{"data":[{"name":"text","data":"5.39"}]}],[{"data":[{"name":"text","data":"2 048×2 048"}]},{"data":[{"name":"text","data":"11×11"}]},{"data":[{"name":"text","data":"2 831"}]},{"data":[{"name":"text","data":"734"}]},{"data":[{"name":"text","data":"3.85"}]}]],"foot":[]}]}},{"name":"p","data":[{"name":"text","data":"从上表数据可以看出:图像分辨率越高,高斯模板尺寸越大,加速的效果越明显。"}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"label":[{"name":"text","data":"6"}],"title":[{"name":"text","data":"结论"}],"level":"1","id":"s6"}},{"name":"p","data":[{"name":"text","data":"本文通过使用CUDA编程实现在GPU和CPU中实现了Canny算子的快速计算。通过对Canny算子中高斯滤波的优化和使用共享存储器减少访问全局存储器,在GPU实现高斯滤波、梯度幅值和方向计算、非极大值抑制和双阈值处理,同时在CPU使用队列FIFO实现边缘连接。相比在CPU上完成Canny算子,最大加速比可达5.39倍。本设计充分利用了GPU的并行计算能力和CPU的串行处理能力,具有较好的灵活性和较快的处理速度,对图像的快速处理具有一定的参考价值。"}]}]}],"footnote":[],"reflist":{"title":[{"name":"text","data":"参考文献"}],"data":[{"id":"b1","label":"1","citation":[{"lang":"en","text":[{"name":"text","data":"CANNY J.A computational approach to edge detection[J]."},{"name":"italic","data":[{"name":"text","data":"IEEE Transactions on Pattern Analysis and Machine Intelligence"}]},{"name":"text","data":",1986,PAMI-8(6):679-698."}]}]},{"id":"b2","label":"2","citation":[{"lang":"en","text":[{"name":"text","data":"NVIDIA."},{"name":"italic","data":[{"name":"text","data":"NVIDIA CUDA Compute Unified Device Architecture Programming Guide:Version"}]},{"name":"text","data":" 3.2[M].Santa Clara:NVIDIA Corporation,2010."}]}]},{"id":"b3","label":"3","citation":[{"lang":"en","text":[{"name":"text","data":"LUO Y C,DURAISWAMI R.Canny edge detection on NVIDIA CUDA[C]."},{"name":"italic","data":[{"name":"text","data":"Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops"}]},{"name":"text","data":",Anchorage,AK:IEEE,2008:1-8."}]}]},{"id":"b4","label":"4","citation":[{"lang":"zh","text":[{"name":"text","data":"钮圣虓,王盛,杨晶晶,等.完全基于边缘信息的快速图像分割算法[J].计算机辅助设计与图形学学报,2012,24(11):1410-1419."}]},{"lang":"en","text":[{"name":"text","data":"NIU S X,WANG S,YANG J J,"},{"name":"italic","data":[{"name":"text","data":"et al."}]},{"name":"text","data":" A fast image segmentation algorithm fully based on edge information[J]."},{"name":"italic","data":[{"name":"text","data":"Journal of Computer-Aided Design&Computer Graphics"}]},{"name":"text","data":",2012,24(11):1410-1419.(in Chinese)"}]}]},{"id":"b5","label":"5","citation":[{"lang":"zh","text":[{"name":"text","data":"石桂名,魏庆涛,孟繁盛.基于Canny算子的图像边缘检测算法[J].现代电子技术,2015,38(12):92-93,97."}]},{"lang":"en","text":[{"name":"text","data":"SHI G M,WEI Q T,MENG F S.Image edge detection algorithm based on Canny operator[J]."},{"name":"italic","data":[{"name":"text","data":"Modern Electronics Technique,"}]},{"name":"text","data":" 2015,38(12):92-93,97.(in Chinese)"}]}]},{"id":"b6","label":"6","citation":[{"lang":"zh","text":[{"name":"text","data":"刘久文,潘峰,李军.结合图像边缘检测和最小误差替换的隐写方案[J].液晶与显示,2015,30(1):151-156."}]},{"lang":"en","text":[{"name":"text","data":"LIU J W,PAN F,LI J.steganography based on edge detection and minimum error replacement[J]."},{"name":"italic","data":[{"name":"text","data":"Chinese Journal of Liquid Crystals and Displays"}]},{"name":"text","data":",2015,30(1):151-156.(in Chinese)"}]}]},{"id":"b7","label":"7","citation":[{"lang":"zh","text":[{"name":"text","data":"许佳佳.结合Harris与SIFT算子的图像快速配准算法[J].中国光学,2015,8(4):574-581."}]},{"lang":"en","text":[{"name":"text","data":"XU J J.Fast image registration method based on Harris and SIFT algorithm[J]."},{"name":"italic","data":[{"name":"text","data":"Chinese Optics,"}]},{"name":"text","data":" 2015,8(4):574-581.(in Chinese)"}]}]},{"id":"b8","label":"8","citation":[{"lang":"zh","text":[{"name":"text","data":"丁怡心,廖勇毅.高斯模糊算法优化及实现[J].现代计算机,2010(8):76-77,100."}]},{"lang":"en","text":[{"name":"text","data":"DING Y X,LIAO Y Y.Optimization and Implementation of Gaussian blur algorithm[J]."},{"name":"italic","data":[{"name":"text","data":"Modern Computer,"}]},{"name":"text","data":" 2010(8):76-77,100.(in Chinese)"}]}]},{"id":"b9","label":"9","citation":[{"lang":"zh","text":[{"name":"text","data":"王新华,王晓坤.十亿像素瞬态成像系统实时图像拼接[J].中国光学,2015,8(5):785-793."}]},{"lang":"en","text":[{"name":"text","data":"WANG X H,WANG X K.Real time image mosaic of the transient gigapixel imaging system[J]."},{"name":"italic","data":[{"name":"text","data":"Chinese Optics"}]},{"name":"text","data":",2015,8(5):785-793.(in Chinese)"}]}]},{"id":"b10","label":"10","citation":[{"lang":"zh","text":[{"name":"text","data":"刘谷,安虹,李小强,等.图广度优先搜索算法面向图形处理器的优化方法研究[J].小型微型计算机系统,2014,35(5):1074-1079."}]},{"lang":"en","text":[{"name":"text","data":"LIU G,AN H,LI X Q,"},{"name":"italic","data":[{"name":"text","data":"et al."}]},{"name":"text","data":" Study on optimizing techniques of breadth-first search algorithm on graphic processing unit[J]."},{"name":"italic","data":[{"name":"text","data":"Journal of Chinese Computer Systems,"}]},{"name":"text","data":" 2014,35(5):1074-1079."}]}]}]},"response":[],"contributions":[],"acknowledgements":[],"conflict":[],"supportedby":[],"articlemeta":{"doi":"10.3788/YJYXS20163107.0714","clc":[[{"name":"text","data":"TP333"}]],"dc":[],"publisherid":"yjyxs-31-7-714","citeme":[],"fundinggroup":[{"lang":"zh","text":[{"name":"text","data":"国家自然科学基金(No.61463009)"}]},{"lang":"en","text":[{"name":"text","data":"基金项目"}]}],"history":{"received":"2015-12-21","accepted":"2016-03-03","opub":"2020-06-15"},"copyright":{"data":[{"lang":"zh","data":[{"name":"text","data":"版权所有 © 《液晶与显示》编辑部 2016"}],"type":"copyright"},{"lang":"en","data":[{"name":"text","data":"Copyright © 2016 Chinese Journal of Liquid Crystals and Displays. All rights reserved."}],"type":"copyright"}],"year":"2016"}},"appendix":[],"type":"research-article","ethics":[],"backSec":[],"supplementary":[],"journalTitle":"液晶与显示","issue":"7","volume":"31","originalSource":[]}