{"defaultlang":"zh","titlegroup":{"articletitle":[{"lang":"zh","data":[{"name":"text","data":"基于校正遮挡感知的光场深度估计"}]},{"lang":"en","data":[{"name":"text","data":"Depth estimation of light field based on correction occlusion perception"}]}]},"contribgroup":{"author":[{"name":[{"lang":"zh","surname":"倪","givenname":"竞","namestyle":"eastern","prefix":""},{"lang":"en","surname":"NI","givenname":"Jing","namestyle":"eastern","prefix":""}],"stringName":[],"aff":[{"rid":"aff1","text":""}],"role":["first-author"],"bio":[{"lang":"zh","text":["倪竞(1998—),男,湖北武汉人,硕士研究生,2020年于湖北工程学院获得学士学位,主要从事光场图像处理、机器学习等方面的研究。E-mail: 779465174@qq.com"],"graphic":[{"print":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395326&type=","small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395328&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395327&type=","width":"22.00000000","height":"29.42500305","fontsize":""}],"data":[[{"name":"text","data":"倪竞"},{"name":"text","data":"(1998—),男,湖北武汉人,硕士研究生,2020年于湖北工程学院获得学士学位,主要从事光场图像处理、机器学习等方面的研究。E-mail: "},{"name":"text","data":"779465174@qq.com"}]]}],"email":"779465174@qq.com","deceased":false},{"name":[{"lang":"zh","surname":"邓","givenname":"慧萍","namestyle":"eastern","prefix":""},{"lang":"en","surname":"DENG","givenname":"Hui-ping","namestyle":"eastern","prefix":""}],"stringName":[],"aff":[{"rid":"aff1","text":""}],"role":["corresp"],"corresp":[{"rid":"cor1","lang":"zh","text":"E-mail: denghuiping@wust.edu.cn","data":[{"name":"text","data":"E-mail: denghuiping@wust.edu.cn"}]}],"bio":[{"lang":"zh","text":["邓慧萍(1983—),女,湖北武汉人,博士,副教授,2013年于华中科技大学获得博士学位,主要从事3D视频与图像处理、机器学习等方面的研究。E-mail: denghuiping@wust.edu.cn"],"graphic":[{"print":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395329&type=","small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395331&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395330&type=","width":"22.00000000","height":"32.21429443","fontsize":""}],"data":[[{"name":"text","data":"邓慧萍"},{"name":"text","data":"(1983—),女,湖北武汉人,博士,副教授,2013年于华中科技大学获得博士学位,主要从事3D视频与图像处理、机器学习等方面的研究。E-mail: "},{"name":"text","data":"denghuiping@wust.edu.cn"}]]}],"email":"denghuiping@wust.edu.cn","deceased":false},{"name":[{"lang":"zh","surname":"向","givenname":"森","namestyle":"eastern","prefix":""},{"lang":"en","surname":"XIANG","givenname":"Sen","namestyle":"eastern","prefix":""}],"stringName":[],"aff":[{"rid":"aff1","text":""}],"role":[],"deceased":false},{"name":[{"lang":"zh","surname":"吴","givenname":"谨","namestyle":"eastern","prefix":""},{"lang":"en","surname":"WU","givenname":"Jin","namestyle":"eastern","prefix":""}],"stringName":[],"aff":[{"rid":"aff1","text":""}],"role":[],"deceased":false}],"aff":[{"id":"aff1","intro":[{"lang":"zh","text":"武汉科技大学 信息科学与工程学院,湖北 武汉 430081","data":[{"name":"text","data":"武汉科技大学 信息科学与工程学院,湖北 武汉 430081"}]},{"lang":"en","text":"School of Information Science and Engineering , Wuhan University of Science and Technology , Wuhan 430081, China","data":[{"name":"text","data":"School of Information Science and Engineering , Wuhan University of Science and Technology , Wuhan 430081, China"}]}]}]},"abstracts":[{"lang":"zh","data":[{"name":"p","data":[{"name":"text","data":"光场图像能够同时记录下空间中不同位置和方向的光线信息,为估计精确的深度图提供了丰富的信息。然而在遮挡和重复纹理等复杂场景下,提取图像特征不足会导致深度图的细节丢失。本文提出了一种基于校正卷积的光场深度估计网络,充分利用光场图像丰富的结构信息以改善遮挡等复杂区域的深度估计。首先,利用初始视差图和子孔径图像生成遮挡掩膜,采用校正卷积判别和编码遮挡区域的空间信息以感知遮挡区域,结合多尺度特征以补充易丢失的边缘细节信息;通过空间注意力机制给予遮挡区域更大权重,消除冗余信息并全局优化亚像素代价体。实验结果表明,该方法在4D 光场基准平台上的平均MSE和BadPix("},{"name":"italic","data":[{"name":"text","data":"ε"}]},{"name":"text","data":"=0.03)分别达到了0.951和4.261,在大部分场景下能实现最小误差的深度估计,对遮挡区域表现出较高鲁棒性并优于其他算法。"}]}]},{"lang":"en","data":[{"name":"p","data":[{"name":"text","data":"The light field image can record the light information of different position and direction in space at the same time, which provides rich information for estimating accurate depth map. However, in complex scenes such as occlusion and repeated texture, the lack of feature extraction will lead to the loss of detail in depth map. An optical field depth estimation network based on correction convolution is proposed to make full use of the rich structural information for optical field images to improve the depth estimation of complex areas such as occlusion. Firstly, the occlusion mask is generated by using the initial disparity map and subaperture image, and the spatial information of the occlusion area is perceived by correcting convolutional discrimination and encoding, and multi-scale features are combined to supplement the edge details that are easily lost .The spatial attention mechanism is used to give more weight to the occlusion area, eliminate redundant information and optimize the subpixel cost body globally. Experimental results show that average MSE and BadPix ("},{"name":"italic","data":[{"name":"text","data":"ε"}]},{"name":"text","data":"=0.03) of the proposed method on 4D optical field reference platform are 0.951 and 4.261, respectively. The proposed method can achieve depth estimation with minimum error in most scenes, and shows high robustness to the occlusion area, which is better than other algorithms."}]}]}],"keyword":[{"lang":"zh","data":[[{"name":"text","data":"光场"}],[{"name":"text","data":"深度估计"}],[{"name":"text","data":"遮挡掩膜"}],[{"name":"text","data":"校正卷积"}],[{"name":"text","data":"空间注意力"}]]},{"lang":"en","data":[[{"name":"text","data":"Light field"}],[{"name":"text","data":"Depth estimation"}],[{"name":"text","data":"Occlusion mask"}],[{"name":"text","data":"Correction convolution"}],[{"name":"text","data":"Spatial attention"}]]}],"highlights":[],"body":[{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"1 引言"}],"level":"1","id":"s1"}},{"name":"p","data":[{"name":"text","data":"获取高精度的深度图是重聚焦"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"1","type":"bibr","rid":"R1","data":[{"name":"text","data":"1"}]}},{"name":"text","data":"]"}]},{"name":"text","data":",3D重建"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"2","type":"bibr","rid":"R2","data":[{"name":"text","data":"2"}]}},{"name":"text","data":"]"}]},{"name":"text","data":",虚拟和增强现实"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"3","type":"bibr","rid":"R3","data":[{"name":"text","data":"3"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"等研究领域的基础性工作。不同于一些基于单目"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"4","type":"bibr","rid":"R4","data":[{"name":"text","data":"4"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"和双目"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"5","type":"bibr","rid":"R5","data":[{"name":"text","data":"5"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"估计深度的方法,光场深度估计利用光场图像所包含场景的多视角信息进行处理,从而提升深度估计的精度和可靠性。"}]},{"name":"p","data":[{"name":"text","data":"尽管光场图像在空间结构上具有优势,但受限于窄基线、遮挡和复杂纹理场景等因素,获取高精度的深度图仍存在周期长、不易优化等难点。在光场深度估计早期工作中,已经有许多传统方法从不同角度出发克服这些问题。Wanner"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"6","type":"bibr","rid":"R6","data":[{"name":"text","data":"6"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"和Zhangshuo"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"7","type":"bibr","rid":"R7","data":[{"name":"text","data":"7"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"通过分析EPI结构特点,分别提出结构张量和旋转平行四边形算子(SPO)的方法,以此求解EPI图像斜率来间接估计深度,但步骤繁琐,时效性较差。Jeon"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"8","type":"bibr","rid":"R8","data":[{"name":"text","data":"8"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"基于相移理论所提出的匹配代价,在频域实现亚像素精度的代价匹配,然而计算匹配代价周期和时间过长,不易于应用。Tao"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"9","type":"bibr","rid":"R9","data":[{"name":"text","data":"9"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"融合散焦线索和对应线索,改善复杂纹理和光暗区域的深度估计;在此基础上,Wang"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"10","type":"bibr","rid":"R10","data":[{"name":"text","data":"10"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"和Williem"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"11","type":"bibr","rid":"R11","data":[{"name":"text","data":"11"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"则基于颜色、光一致性进一步解决遮挡问题。以上经典方法能在一定程度上改善窄基线和遮挡等因素造成的影响,但计算时间较长,推理步骤繁琐。"}]},{"name":"p","data":[{"name":"text","data":"近些年来,基于深度学习的方法已经广泛被用于全息图像重建"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"12","type":"bibr","rid":"R12","data":[{"name":"text","data":"12"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"和光场深度估计等光学领域"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"13","type":"bibr","rid":"R13","data":[{"name":"text","data":"13"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"。不同于一般的深度神经网络"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"14","type":"bibr","rid":"R14","data":[{"name":"text","data":"14"}]}},{"name":"text","data":"]"}]},{"name":"text","data":",卷积神经网络(CNN)"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"4","type":"bibr","rid":"R4","data":[{"name":"text","data":"4"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"与图像处理具有天然的适配性,因而基于CNN的光场深度估计显著提升了深度图的质量。Heber"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"15","type":"bibr","rid":"R15","data":[{"name":"text","data":"15"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"等人首次采用端到端的网络来学习4D光场与深度的对应关系,随后提出U-Net"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"16","type":"bibr","rid":"R16","data":[{"name":"text","data":"16"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"从光场中提取几何信息来估计深度。但该算法未预处理数据,简易的网络结构所预测的深度准确率较低。Shin"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"17","type":"bibr","rid":"R17","data":[{"name":"text","data":"17"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"等人提出多流网络和数据增强的策略,以实现快速、准确的深度估计。Tsai"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"18","type":"bibr","rid":"R18","data":[{"name":"text","data":"18"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"等人提出了一种基于注意力的视图选择网络,自适应地考虑每个视图不同的重要性,但未考虑到每个视图的空间信息。在此基础上,Chen"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"19","type":"bibr","rid":"R19","data":[{"name":"text","data":"19"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"等人进一步提出多级注意力融合网络,通过结合多流输入与视图选择网络,采用注意力机制融合四个主要方向的支路来处理遮挡。但这些方法在特征提取阶段,仅仅使用空间金字塔(SPP)或2D残差块提取多尺度和上下文信息。而在遮挡和复杂纹理区域,这些策略难以有效提取到各个视图中的细节特征,导致视点之间存在误匹配,深度图的细节区域存在误差。"}]},{"name":"p","data":[{"name":"text","data":"由于遮挡场景是影响深度图质量的主要因素,考虑到校正卷积能对输入特征进行重新判别和编码,从而感知遮挡等复杂区域的特征。因此,本文设计了一种基于校正卷积的光场深度估计架构,通过初始预测图和子孔径视图生成遮挡掩膜,提取遮挡区域局部细节特征并改善初始代价体中存在误匹配像素点的视差值;同时在亚像素视差下结合空间注意力,优化并改善代价体的全局特征,显著提高了深度图的质量。该方法主要贡献为以下方面:"}]},{"name":"p","data":[{"name":"text","data":"为解决遮挡区域的不适定问题,引入了遮挡掩膜(OM)和基于校正卷积的特征提取和交互模块(SC_moudle)。利用校正卷积在子空间对复杂特征判别、编码的能力,充分提取遮挡等复杂区域特征,结合空间金字塔(SPP)提取到的多尺度信息,从而补充和增强每个视图的细节特征。"}]},{"name":"p","data":[{"name":"text","data":"利用亚像素代价体的网络结构学习到精细的视差分布,同时结合空间注意力能有效处理复杂场景下的空间特征。本文通过构建亚像素级别代价体,采用带有空间注意力机制(SA)的3D残差块,重点关注遮挡及相邻区域特征并消除冗余信息,有效改善深度图整体质量。"}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"2 本文方法"}],"level":"1","id":"s2"}},{"name":"p","data":[{"name":"text","data":"深度估计通过从输入的图像中计算视差,再由视差与深度的几何关系获取场景的深度信息。由"},{"name":"xref","data":{"text":"图1","type":"fig","rid":"F1","data":[{"name":"text","data":"图1"}]}},{"name":"text","data":"所示,参考相机和目标相机分别由O"},{"name":"sub","data":[{"name":"text","data":"R"}]},{"name":"text","data":"、O"},{"name":"sub","data":[{"name":"text","data":"T"}]},{"name":"text","data":"表示,场景中任意一点"},{"name":"italic","data":[{"name":"text","data":"P"}]},{"name":"text","data":"("},{"name":"italic","data":[{"name":"text","data":"x,z"}]},{"name":"text","data":")投影至像平面则分别为"},{"name":"italic","data":[{"name":"text","data":"P"},{"name":"sub","data":[{"name":"text","data":"R"}]}]},{"name":"text","data":"和"},{"name":"italic","data":[{"name":"text","data":"P"},{"name":"sub","data":[{"name":"text","data":"T"}]}]},{"name":"text","data":"。其中"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"XR","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395333&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395332&type=","width":"4.31799984","height":"3.72533321","fontsize":""}}}]},{"name":"text","data":"、"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"XT","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395341&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395340&type=","width":"4.31799984","height":"3.72533321","fontsize":""}}}]},{"name":"text","data":"对应两成像点的x轴方向的坐标,"},{"name":"italic","data":[{"name":"text","data":"b、f"}]},{"name":"text","data":"分别表示相机的基线和焦距,"},{"name":"italic","data":[{"name":"text","data":"z"}]},{"name":"text","data":"代表"},{"name":"italic","data":[{"name":"text","data":"P"}]},{"name":"text","data":"点距相机的深度。通过建立参考和目标视图之间的像素对应关系,计算出每个像素在两视图之间的水平偏移量,也就是视差值。然后通过三角测量法,结合摄像头参数等相关信息,从而得到这个像素点在三维空间中深度值"},{"name":"italic","data":[{"name":"text","data":"z"}]},{"name":"text","data":":"}]},{"name":"dispformula","data":{"label":[{"name":"text","data":"(1)"}],"data":[{"name":"math","data":{"math":"z=f*b/d,","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395337&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395336&type=","width":"17.01799965","height":"3.89466691","fontsize":""}}},{"name":"text","data":" "}],"id":"DF1"}},{"name":"p","data":[{"name":"text","data":"其中"},{"name":"italic","data":[{"name":"text","data":"d"}]},{"name":"text","data":"=("},{"name":"inlineformula","data":[{"name":"math","data":{"math":"XR","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395339&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395338&type=","width":"4.31799984","height":"3.72533321","fontsize":""}}}]},{"name":"italic","data":[{"name":"text","data":"-"}]},{"name":"inlineformula","data":[{"name":"math","data":{"math":"XT","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395341&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395340&type=","width":"4.31799984","height":"3.72533321","fontsize":""}}}]},{"name":"text","data":"),表示参考视图与目标视图对应像素点的视差。"}]},{"name":"fig","data":{"id":"F1","caption":[{"lang":"zh","label":[{"name":"text","data":"图1"}],"title":[{"name":"text","data":"视差与深度"}]},{"lang":"en","label":[{"name":"text","data":"Fig.1"}],"title":[{"name":"text","data":"disparity and depth"}]}],"subcaption":[],"note":[],"graphics":[{"print":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395342&type=","small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395344&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395343&type=","width":"67.92299652","height":"59.61899948","fontsize":""}]}},{"name":"p","data":[{"name":"text","data":"基于子孔径的光场深度估计分为四路的EPI"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"17","type":"bibr","rid":"R17","data":[{"name":"text","data":"17"}]}},{"name":"text","data":"]["},{"name":"xref","data":{"text":"19","type":"bibr","rid":"R19","data":[{"name":"text","data":"19"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"结构和全阵列子孔径"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"18","type":"bibr","rid":"R18","data":[{"name":"text","data":"18"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"图像。虽然EPI结构的光场图像利用四路主要视图方向减少了冗余信息,但丢失了其他视图可能存在的细节特征。由于全阵列子孔径图像拥有丰富的多视角特征信息,有利于改善整体深度图,因此本文将所有视角下的子孔径图像作为网络的输入。"}]},{"name":"p","data":[{"name":"text","data":"由于光场深度估计受遮挡等挑战性场景影响较大,获取遮挡区域信息十分困难。本文利用初始预测图和子孔径视图(SAI)生成遮挡区域掩膜,通过校正卷积提取深层上下文信息,并结合多尺度底层特征,从而获取遮挡区域的细节特征。"}]},{"name":"p","data":[{"name":"text","data":"精细的视差分布有利于网络学习更精确的深度,因此本文将提取到的所有视图下的细节特征构建为亚像素代价体,此时代价体包含所有视角下34个视差等级的特征信息。考虑到亚像素代价体存在大量冗余信息的干扰,会导致错误的视差回归。所以该代价体由视图选择模块和3D空间注意力(SA)进行全局优化,以提取出正确的视差信息。最后通过概率回归得到最终的深度,"},{"name":"xref","data":{"text":"图2","type":"fig","rid":"F2","data":[{"name":"text","data":"图2"}]}},{"name":"text","data":"展示了网络的整体结构。"}]},{"name":"fig","data":{"id":"F2","caption":[{"lang":"zh","label":[{"name":"text","data":"图2"}],"title":[{"name":"text","data":"网络整体结构"}]},{"lang":"en","label":[{"name":"text","data":"Fig.2"}],"title":[{"name":"text","data":"Overall network structure"}]}],"subcaption":[],"note":[],"graphics":[{"print":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395345&type=","small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395347&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395346&type=","width":"144.41000366","height":"64.11699677","fontsize":""}]}},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"2.1 校正卷积遮挡感知"}],"level":"2","id":"s2a"}},{"name":"p","data":[{"name":"text","data":"为了从子孔径图像中提取有效特征进行视点匹配,通常采用残差块和金字塔池化(SPP)模块进行多尺度的特征提取,通过提取多层的上下文信息,增强遮挡等挑战性区域的重要特征。利用SPP提取每个视角特征时,多尺度特征和更少的参数有利于后续代价匹配和网络的学习。然而仅仅使用金字塔池化后的多尺度特征信息去补充挑战性区域是不充分的,由于遮挡,一些视角下的图像无法通过普通卷积来提取特征,从而造成空间信息的丢失。考虑到校正卷积能通过判别和编码遮挡区域的特征信息,提取关键区域的细节特征。为了改善遮挡场景下细节和多尺度特征,改善代价体的视点匹配,因此本文利用初始预测图为所有视角下的子孔径图像生成遮挡掩膜,以引导感知遮挡区域,并采用包含分组和空洞卷积的校正模块提取和补充遮挡区域特征。"}]},{"name":"p","data":[{"name":"text","data":"如"},{"name":"xref","data":{"text":"图3","type":"fig","rid":"F3","data":[{"name":"text","data":"图3"}]}},{"name":"text","data":"所示在sideboard场景下,如果不存在遮挡,那么根据光一致性假设,扭曲视图的像素与中心视图的像素相同。首先利用初始视差图和原始子孔径视图的映射关系C,得到所有视角下的扭曲视图"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"Iw","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395349&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395348&type=","width":"3.04800010","height":"3.72533321","fontsize":""}}}]},{"name":"text","data":":"}]},{"name":"dispformula","data":{"label":[{"name":"text","data":"(2)"}],"data":[{"name":"math","data":{"math":"Iwj=Warp(c(pi,Ij)),j=1,2,3...,u×v,","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395351&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395350&type=","width":"68.24134064","height":"5.07999992","fontsize":""}}},{"name":"text","data":" "}],"id":"DF2"}},{"name":"p","data":[{"name":"text","data":"其中"},{"name":"italic","data":[{"name":"text","data":"j"}]},{"name":"text","data":"代表每个视角下的标号,共有"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"u×v","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395353&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395352&type=","width":"8.55133343","height":"2.79399991","fontsize":""}}}]},{"name":"text","data":"个视图。基于此约束条件,计算中心视图与各个视角下的扭曲视图的遮挡映射,可以得到每个视图的遮挡掩膜"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"Imi","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395355&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395354&type=","width":"3.04800010","height":"3.97933316","fontsize":""}}}]},{"name":"text","data":":"}]},{"name":"dispformula","data":{"label":[{"name":"text","data":"(3)"}],"data":[{"name":"math","data":{"math":"Imi=(1-||Ic-Iwi||)2,","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395357&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395356&type=","width":"32.85066605","height":"4.82600021","fontsize":""}}},{"name":"text","data":" "}],"id":"DF3"}},{"name":"p","data":[{"name":"text","data":"其中遮挡掩膜具体表现为场景中物体的边缘区域。为了让遮挡掩膜更好地引导网络学习该区域特征,我们将初始特征提取的底层特征与掩膜进行自适应融合得到特征"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"Ft","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395301&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395300&type=","width":"3.38666677","height":"3.72533321","fontsize":""}}}]},{"name":"text","data":"。"}]},{"name":"fig","data":{"id":"F3","caption":[{"lang":"zh","label":[{"name":"text","data":"图3"}],"title":[{"name":"text","data":"遮挡掩膜"}]},{"lang":"en","label":[{"name":"text","data":"Fig. 3"}],"title":[{"name":"text","data":"Occlusion mask"}]}],"subcaption":[],"note":[],"graphics":[{"print":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395302&type=","small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395304&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395303&type=","width":"100.00000000","height":"43.93114090","fontsize":""}]}},{"name":"p","data":[{"name":"text","data":"在经过4个3×3大小的2D卷积提取初始特征后,特征输入到SPP和SC_module分别提取多尺度信息和深层细节信息,最后拼接构成代价体。不同于普通的SC_Net"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"22","type":"bibr","rid":"R22","data":[{"name":"text","data":"22"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"采用降维采样连接上下两路特征,本文使用分组、膨胀改进后的校正卷积(Conv"},{"name":"sub","data":[{"name":"text","data":"gd"}]},{"name":"text","data":"),以在更大感受野的子空间提取深层特征。"}]},{"name":"p","data":[{"name":"text","data":"具体如"},{"name":"xref","data":{"text":"图4","type":"fig","rid":"F4","data":[{"name":"text","data":"图4"}]}},{"name":"text","data":"所示,每个视图输入的特征"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"Ft","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395306&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395305&type=","width":"3.38666677","height":"3.72533321","fontsize":""}}}]},{"name":"text","data":"为C×H×W,按通道划分为两个子特征"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"Fu","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395308&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395307&type=","width":"3.72533321","height":"3.72533321","fontsize":""}}}]},{"name":"text","data":"和"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"Fv","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395310&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395309&type=","width":"3.72533321","height":"3.72533321","fontsize":""}}}]},{"name":"text","data":".如"},{"name":"xref","data":{"text":"式(4)","type":"disp-formula","rid":"DF4","data":[{"name":"text","data":"式(4)"}]}},{"name":"text","data":"所示:"}]},{"name":"dispformula","data":{"label":[{"name":"text","data":"(4)"}],"data":[{"name":"math","data":{"math":"(Fu,Fv)=Split(Ft),","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395312&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395311&type=","width":"33.78200150","height":"4.82600021","fontsize":""}}},{"name":"text","data":" "}],"id":"DF4"}},{"name":"fig","data":{"id":"F4","caption":[{"lang":"zh","label":[{"name":"text","data":"图4"}],"title":[{"name":"text","data":"校正卷积特征融合模块"}]},{"lang":"en","label":[{"name":"text","data":"Fig.4"}],"title":[{"name":"text","data":"Correction convolution feature fusion module"}]}],"subcaption":[],"note":[],"graphics":[{"print":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395358&type=","small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395360&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395359&type=","width":"100.00000000","height":"48.52165604","fontsize":""}]}},{"name":"p","data":[{"name":"italic","data":[{"name":"text","data":"F"},{"name":"sub","data":[{"name":"text","data":"u"}]}]},{"name":"text","data":"的上层特征通过下采样在较小的潜在空间进行卷积特征转换(自校准),并考虑每个空间位置局部的上下文信息,避免了不相关区域的错误信息。随后通过上采样在更大更准确的的判别区域编码,重新考虑遮挡区域边界处的特征信息得到"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"Fu0","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395362&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395361&type=","width":"4.74133301","height":"3.72533321","fontsize":""}}}]},{"name":"text","data":",表示为:"}]},{"name":"dispformula","data":{"label":[{"name":"text","data":"(5)"}],"data":[{"name":"math","data":{"math":"Fu0=Upsample(Convgd(Avgpool(Fu))),","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395364&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395363&type=","width":"65.70133209","height":"4.82600021","fontsize":""}}},{"name":"text","data":" "}],"id":"DF5"}},{"name":"p","data":[{"name":"text","data":"下层特征"},{"name":"italic","data":[{"name":"text","data":"F"},{"name":"sub","data":[{"name":"text","data":"v"}]}]},{"name":"text","data":"则保留原始上下文信息,最后通过融合不同尺度上下两层特征得到输出"},{"name":"italic","data":[{"name":"text","data":"F"},{"name":"sub","data":[{"name":"text","data":"g"}]}]},{"name":"text","data":":"}]},{"name":"dispformula","data":{"label":[{"name":"text","data":"(6)"}],"data":[{"name":"math","data":{"math":"Fg=Convgd(Fu)Sigmoid(Fu0+Fu),","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395366&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395365&type=","width":"63.58466721","height":"4.91066647","fontsize":""}}},{"name":"text","data":" "}],"id":"DF6"}},{"name":"p","data":[{"name":"italic","data":[{"name":"text","data":"F"},{"name":"sub","data":[{"name":"text","data":"v"}]}]},{"name":"text","data":"通过空洞卷积提取原尺度特征"},{"name":"italic","data":[{"name":"text","data":"F"},{"name":"sub","data":[{"name":"text","data":"h"}]}]},{"name":"text","data":",并与"},{"name":"italic","data":[{"name":"text","data":"F"},{"name":"sub","data":[{"name":"text","data":"g"}]}]},{"name":"text","data":"连接得到最终的输出特征"},{"name":"italic","data":[{"name":"text","data":"F"},{"name":"sub","data":[{"name":"text","data":"c"}]}]},{"name":"text","data":",如"},{"name":"xref","data":{"text":"式(7)","type":"disp-formula","rid":"DF7","data":[{"name":"text","data":"式(7)"}]}},{"name":"text","data":"所示:"}]},{"name":"dispformula","data":{"label":[{"name":"text","data":"(7)"}],"data":[{"name":"math","data":{"math":"Fh=Convgd(Fv),Fc=Concat(Fg,Fh),","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395368&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395367&type=","width":"63.16133118","height":"4.48733330","fontsize":""}}},{"name":"text","data":" "}],"id":"DF7"}}]},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"2.2 亚像素匹配代价构建"}],"level":"2","id":"s2b"}},{"name":"p","data":[{"name":"text","data":"光场一般用双平面模型来表示,即"},{"name":"italic","data":[{"name":"text","data":"L"}]},{"name":"text","data":"("},{"name":"italic","data":[{"name":"text","data":"x"}]},{"name":"text","data":","},{"name":"italic","data":[{"name":"text","data":"y"}]},{"name":"text","data":","},{"name":"italic","data":[{"name":"text","data":"u"}]},{"name":"text","data":","},{"name":"italic","data":[{"name":"text","data":"v"}]},{"name":"text","data":")。其中("},{"name":"italic","data":[{"name":"text","data":"x"}]},{"name":"text","data":","},{"name":"italic","data":[{"name":"text","data":"y"}]},{"name":"text","data":")和("},{"name":"italic","data":[{"name":"text","data":"u"}]},{"name":"text","data":","},{"name":"italic","data":[{"name":"text","data":"v"}]},{"name":"text","data":")分别代表场景中光线的空间坐标和角度坐标。通过分析双平面模型和视点间的几何关系,视差可以表示为:"}]},{"name":"dispformula","data":{"label":[{"name":"text","data":"(8)"}],"data":[{"name":"math","data":{"math":"L(uc,vc,x,y)=L(u,v,x+(uc-u)*d(x,y),y+(vc-v)*d(x,y)),","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395370&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395369&type=","width":"65.95532990","height":"10.66800022","fontsize":""}}},{"name":"text","data":" "}],"id":"DF8"}},{"name":"p","data":[{"name":"text","data":"其中"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"d(x,y)","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395373&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395371&type=","width":"11.85333347","height":"3.97933316","fontsize":""}}}]},{"name":"text","data":"表示中心视图与相邻视图在"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"(x,y)","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395374&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395372&type=","width":"9.05933285","height":"3.97933316","fontsize":""}}}]},{"name":"text","data":"处的视差。由于卷积神经网络(CNN)感受野有限,如果位移大于感受野,CNN网络很难预测到正确的视差。相对于简单连接"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"17","type":"bibr","rid":"R17","data":[{"name":"text","data":"17"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"特征图,将输入特征图沿u或v角度方向以不同的视差进行水平移动"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"18","type":"bibr","rid":"R18","data":[{"name":"text","data":"18"}]}},{"name":"text","data":"]"}]},{"name":"text","data":",在代价聚合中网络可以使用相对较小的感受野直接捕捉到不同空间位置的像素信息。"}]},{"name":"p","data":[{"name":"text","data":"考虑到光场图像的特性:较窄的基线和丰富的角度信息,亚像素级别的匹配代价能有效改善窄基线的影响并有利于子视图更准确的视差匹配。在权衡精细的视差分布和更多的计算成本后,本文首先用原代价体得到初始视差图,结合子孔径图像生成遮挡掩膜的同时并减少推理成本。然后采用双线性插值的方法,构建亚像素级别的代价体学习更精细的视差,并在训练过程进行随机采样和提高批次大小来加速模型收敛。亚像素代价体具体有34个视差水平,采样范围从-4到4,每隔0.25采样一次。视差范围包含所有场景下的最小和最大视差,而代价体通过将特征图转移到预定义的视差范围内来构建,随后通过成本聚合和视差回归来转化视差分布。在水平移动特征图后,每个视图特征在视差维度上拼接成一个5D亚像素代价体。"}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"2.3 代价聚合和视差回归"}],"level":"2","id":"s2c"}},{"name":"p","data":[{"name":"text","data":"在构建亚像素代价体后,此时代价体包含所有视图所映射的特征信息。考虑到每个视图和空间内各个像素点重要性各不相同,遮挡边界区域的像素点蕴含的语义信息更加丰富。为了过滤掉多余的特征信息并优化网络,本文采用带有空间注意力的3DCNN结合视图选择网络进行代价优化。"}]},{"name":"p","data":[{"name":"text","data":"代价体首先经过基于注意力的视图选择模块进行初步筛选,选择关键视图并给予更大权重。如"},{"name":"xref","data":{"text":"图5","type":"fig","rid":"F5","data":[{"name":"text","data":"图5"}]}},{"name":"text","data":"所示:"}]},{"name":"fig","data":{"id":"F5","caption":[{"lang":"zh","label":[{"name":"text","data":"图 5"}],"title":[{"name":"text","data":"视图选择模块"}]},{"lang":"en","label":[{"name":"text","data":"Fig.5"}],"title":[{"name":"text","data":"View selection module"}]}],"subcaption":[],"note":[],"graphics":[{"print":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395388&type=","small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395375&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395389&type=","width":"72.97200012","height":"42.05099869","fontsize":""}]}},{"name":"p","data":[{"name":"text","data":"不同于二维图像处理使用2DCNN空间注意力学习到空间信息后与原特征相乘得到输出特征。基于光场一致性的特点,3D卷积层可以让代价体在视差维度"},{"name":"italic","data":[{"name":"text","data":"D"}]},{"name":"text","data":"的空间进行相互联系,这对于光场子孔径图像不同视图间相似的结构特点是十分有效的。而空间注意力能让网络重点学习遮挡物边界区域的特征信息有助于估计准确的深度,改进后的3DCNN空间注意力能够让代价体在视差空间中通过联系学习到更有效的特征。如"},{"name":"xref","data":{"text":"图6","type":"fig","rid":"F6","data":[{"name":"text","data":"图6"}]}},{"name":"text","data":"所示,代价体沿通道维度分别进行平均池化和最大池化,在学习空间中关键特征后进行整合,然后送入卷积层激活得到注意力,最后与代价体相乘得到最终的空间注意力图。"}]},{"name":"fig","data":{"id":"F6","caption":[{"lang":"zh","label":[{"name":"text","data":"图 6"}],"title":[{"name":"text","data":"改进的3d空间注意力"}]},{"lang":"en","label":[{"name":"text","data":"Fig.6"}],"title":[{"name":"text","data":"Improved 3d spatial attention"}]}],"subcaption":[],"note":[],"graphics":[{"print":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395376&type=","small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395390&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395377&type=","width":"76.90000153","height":"25.47899628","fontsize":""}]}},{"name":"p","data":[{"name":"text","data":"在代价聚合后,视差图"},{"name":"italic","data":[{"name":"text","data":"d"}]},{"name":"text","data":"用视差回归表示为:"}]},{"name":"dispformula","data":{"label":[{"name":"text","data":"(9)"}],"data":[{"name":"math","data":{"math":"d^=d=DminDmaxd×σ(-cd),","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395392&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395391&type=","width":"36.40666580","height":"8.89000034","fontsize":""}}},{"name":"text","data":" "}],"id":"DF9"}},{"name":"p","data":[{"name":"text","data":"其中"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"cd","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395394&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395393&type=","width":"2.79399991","height":"3.72533321","fontsize":""}}}]},{"name":"text","data":"是沿维度"},{"name":"italic","data":[{"name":"text","data":"D"}]},{"name":"text","data":"的切片"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"CfD×H×W","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395396&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395395&type=","width":"26.67000008","height":"4.06400013","fontsize":""}}}]},{"name":"text","data":",表示视差标签"},{"name":"italic","data":[{"name":"text","data":"d"}]},{"name":"text","data":"的成本,每个视差标签的概率由σ(softmax)运算符计算,然后根据概率估计视差"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"d^","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395398&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395397&type=","width":"2.28600001","height":"3.72533321","fontsize":""}}}]},{"name":"text","data":"。"}]}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"3 实验结果与分析"}],"level":"1","id":"s3"}},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"3.1 训练细节"}],"level":"2","id":"s3a"}},{"name":"p","data":[{"name":"text","data":"实验采用4D合成光场数据集,其中每个场景下有9×9个子孔径视图,每个视图的图像分辨率为512px×512px。该数据集大多数场景视差范围在-1.5px至1.5px内,并提供相机的内外参数,例如焦距和相机坐标系等。“Additional”中的16个场景作为训练集,“Stratified”和“Training”中的7个场景用于验证集。训练过程中,为了加快模型训练速度,从LF图像中随机裁剪32×32个灰度块来训练,并采用数据增强策略"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"17","type":"bibr","rid":"R17","data":[{"name":"text","data":"17"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"避免过拟合,在训练过程中手动去除反射、折射和无纹理区域用于提高模型性能。在本文中使用L1loss,用于测量估计视差"},{"name":"italic","data":[{"name":"text","data":"D"},{"name":"sub","data":[{"name":"text","data":"d"}]}]},{"name":"text","data":"和地面真实视差"},{"name":"italic","data":[{"name":"text","data":"D"},{"name":"sub","data":[{"name":"text","data":"gt"}]}]},{"name":"text","data":"的差异,并使用Adam"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"21","type":"bibr","rid":"R21","data":[{"name":"text","data":"21"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"优化器优化网络并加速收敛。为了提高模型收敛速率并提高精度,批量大小"},{"name":"italic","data":[{"name":"text","data":"b"}]},{"name":"text","data":"设置为16,初始学习率"},{"name":"italic","data":[{"name":"text","data":"lr"}]},{"name":"text","data":"为1e-3,每训练100epoch衰减一半。该模型在NVIDIA-GTX-3090GPU上进行训练,主要环境配置为tensorflow-gpu2.5,大概需要一周时间训练。"}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"3.2 实验评估"}],"level":"2","id":"s3b"}},{"name":"p","data":[{"name":"text","data":"本文所提出的方法与几种在4D光场基准测试中不同类别的方法进行了比较,包括LF(2015),SPO(2016),LF_OCC(2016)三种经典传统方法和Epinet(2018),Lfattnet(2020)两种基于深度学习方法。实验中将BadPix("},{"name":"inlineformula","data":[{"name":"math","data":{"math":"ε","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395411&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395410&type=","width":"1.43933344","height":"2.79399991","fontsize":""}}}]},{"name":"text","data":")和均方误差(MSE)两个指标用于定量评估。"}]},{"name":"dispformula","data":{"label":[{"name":"text","data":"(10)"}],"data":[{"name":"math","data":{"math":"MSE=1ni=1n(Yi-Y^i)2,","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395402&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395401&type=","width":"39.53932953","height":"8.63599968","fontsize":""}}},{"name":"text","data":" "}],"id":"DF10"}},{"name":"p","data":[{"name":"text","data":"BadPix("},{"name":"inlineformula","data":[{"name":"math","data":{"math":"ε","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395411&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395410&type=","width":"1.43933344","height":"2.79399991","fontsize":""}}}]},{"name":"text","data":")即测量误差超过"},{"name":"inlineformula","data":[{"name":"math","data":{"math":"ε","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395411&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395410&type=","width":"1.43933344","height":"2.79399991","fontsize":""}}}]},{"name":"text","data":"的错误估计像素的百分比,即:"}]},{"name":"dispformula","data":{"label":[{"name":"text","data":"(11)"}],"data":[{"name":"math","data":{"math":"|dgt(i)-de(i)|>ε(i),","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395378&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395407&type=","width":"36.32199860","height":"4.65666676","fontsize":""}}},{"name":"text","data":" "}],"id":"DF11"}},{"name":"p","data":[{"name":"inlineformula","data":[{"name":"math","data":{"math":"ε","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395411&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395410&type=","width":"1.43933344","height":"2.79399991","fontsize":""}}}]},{"name":"text","data":"分别设置为 0.07、0.03 和 0.01。考虑到一些算法在部分简单纹理场景下差异不明显,算法比较中采用BadPix("},{"name":"inlineformula","data":[{"name":"math","data":{"math":"ε","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395411&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395410&type=","width":"1.43933344","height":"2.79399991","fontsize":""}}}]},{"name":"text","data":"=0.03)的坏像素图和视差图进行分析,并在指标计算中剔除掉简易场景Pyramids。"}]},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"3"},{"name":"italic","data":[{"name":"text","data":"."}]},{"name":"text","data":"2"},{"name":"italic","data":[{"name":"text","data":"."}]},{"name":"text","data":"1 定量实验"}],"level":"3","id":"s3b1"}},{"name":"p","data":[{"name":"xref","data":{"text":"表1","type":"table","rid":"T1","data":[{"name":"text","data":"表1"}]}},{"name":"text","data":"、"},{"name":"xref","data":{"text":"表2","type":"table","rid":"T2","data":[{"name":"text","data":"表2"}]}},{"name":"text","data":"分别显示了各个方法在不同场景下MSE指标和BadPix指标结果,本文方法在大多数场景中实现了最低(加粗)和次低误差(下划线),特别是对于具有大量遮挡或者复杂纹理的场景,例如“Sideboard”和“Boxes”,所提出的方法在BadPix和MSE表现结果上均明显优于其他方法。"},{"name":"xref","data":{"text":"表3","type":"table","rid":"T3","data":[{"name":"text","data":"表3"}]}},{"name":"text","data":"则展示了各个算法的平均运算时间,本文方法在保证高精度结果的同时也减少了推理成本。"}]},{"name":"table","data":{"id":"T1","caption":[{"lang":"zh","label":[{"name":"text","data":"表1"}],"title":[{"name":"text","data":"均方误差指标对比"}]},{"lang":"en","label":[{"name":"text","data":"Tab.1"}],"title":[{"name":"text","data":"MSE index comparison"}]}],"note":[],"table":[{"head":[[{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Backgammon"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Dots"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Stripes"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Boxes"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Cotton"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Dino"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Sideboard"}]}]],"body":[[{"align":"center","data":[{"name":"text","data":"LF(2015)"}]},{"align":"center","data":[{"name":"text","data":"11.52"}]},{"align":"center","data":[{"name":"text","data":"5.42"}]},{"align":"center","data":[{"name":"text","data":"17.09"}]},{"align":"center","data":[{"name":"text","data":"17.43"}]},{"align":"center","data":[{"name":"text","data":"9.17"}]},{"align":"center","data":[{"name":"text","data":"1.16"}]},{"align":"center","data":[{"name":"text","data":"5.07"}]}],[{"align":"center","data":[{"name":"text","data":"LF_OCC(2016)"}]},{"align":"center","data":[{"name":"text","data":"20.99"}]},{"align":"center","data":[{"name":"text","data":"3.02"}]},{"align":"center","data":[{"name":"text","data":"8.09"}]},{"align":"center","data":[{"name":"text","data":"9.85"}]},{"align":"center","data":[{"name":"text","data":"1.07"}]},{"align":"center","data":[{"name":"text","data":"1.14"}]},{"align":"center","data":[{"name":"text","data":"2.30"}]}],[{"align":"center","data":[{"name":"text","data":"SPO(2016)"}]},{"align":"center","data":[{"name":"text","data":"3.24"}]},{"align":"center","data":[{"name":"text","data":"4.87"}]},{"align":"center","data":[{"name":"text","data":"6.68"}]},{"align":"center","data":[{"name":"text","data":"9.11"}]},{"align":"center","data":[{"name":"text","data":"1.31"}]},{"align":"center","data":[{"name":"text","data":"0.31"}]},{"align":"center","data":[{"name":"text","data":"1.02"}]}],[{"align":"center","data":[{"name":"text","data":"EPINET(2018)"}]},{"align":"center","data":[{"name":"text","data":"1.91"}]},{"align":"center","data":[{"name":"text","data":"1.60"}]},{"align":"center","data":[{"name":"text","data":"0.27"}]},{"align":"center","data":[{"name":"text","data":"6.04"}]},{"align":"center","data":[{"name":"text","data":"0.22"}]},{"align":"center","data":[{"name":"text","data":"0.15"}]},{"align":"center","data":[{"name":"text","data":"0.81"}]}],[{"align":"center","data":[{"name":"text","data":"LFATT(2020)"}]},{"align":"center","data":[{"name":"text","data":"1.81"}]},{"align":"center","data":[{"name":"bold","data":[{"name":"text","data":"0.99"}]}]},{"align":"center","data":[{"name":"bold","data":[{"name":"text","data":"0.22"}]}]},{"align":"center","data":[{"name":"text","data":"4.00"}]},{"align":"center","data":[{"name":"bold","data":[{"name":"text","data":"0.21"}]}]},{"align":"center","data":[{"name":"bold","data":[{"name":"text","data":"0.09"}]}]},{"align":"center","data":[{"name":"text","data":"0.53"}]}],[{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"OUR"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"bold","data":[{"name":"text","data":"1.51"}]}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"1.19"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"bold","data":[{"name":"text","data":"0.22"}]}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"bold","data":[{"name":"text","data":"2.94"}]}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"bold","data":[{"name":"text","data":"0.21"}]}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"0.10"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"bold","data":[{"name":"text","data":"0.49"}]}]}]],"foot":[]}],"graphics":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395380&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395379&type=","width":"161.79995728","height":"34.82499695","fontsize":""}}},{"name":"table","data":{"id":"T2","caption":[{"lang":"zh","label":[{"name":"text","data":"表2"}],"title":[{"name":"text","data":"坏像素指标对比"}]},{"lang":"en","label":[{"name":"text","data":"Tab.2"}],"title":[{"name":"text","data":"BadPix index comparison"}]}],"note":[],"table":[{"head":[[{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Backgammon"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Dots"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Stripes"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Boxes"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Cotton"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Dino"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"Sideboard"}]}]],"body":[[{"align":"center","data":[{"name":"text","data":"LF(2015)"}]},{"align":"center","data":[{"name":"text","data":"4.77"}]},{"align":"center","data":[{"name":"text","data":"5.33"}]},{"align":"center","data":[{"name":"text","data":"34.88"}]},{"align":"center","data":[{"name":"text","data":"38.42"}]},{"align":"center","data":[{"name":"text","data":"21.36"}]},{"align":"center","data":[{"name":"text","data":"45.12"}]},{"align":"center","data":[{"name":"text","data":"35.42"}]}],[{"align":"center","data":[{"name":"text","data":"LF_OCC(2016)"}]},{"align":"center","data":[{"name":"text","data":"39.53"}]},{"align":"center","data":[{"name":"text","data":"23.46"}]},{"align":"center","data":[{"name":"text","data":"21.08"}]},{"align":"center","data":[{"name":"text","data":"55.14"}]},{"align":"center","data":[{"name":"text","data":"33.72"}]},{"align":"center","data":[{"name":"text","data":"48.89"}]},{"align":"center","data":[{"name":"text","data":"51.15"}]}],[{"align":"center","data":[{"name":"text","data":"SPO(2016)"}]},{"align":"center","data":[{"name":"text","data":"7.62"}]},{"align":"center","data":[{"name":"text","data":"34.91"}]},{"align":"center","data":[{"name":"text","data":"14.86"}]},{"align":"center","data":[{"name":"text","data":"29.53"}]},{"align":"center","data":[{"name":"text","data":"13.71"}]},{"align":"center","data":[{"name":"text","data":"16.36"}]},{"align":"center","data":[{"name":"text","data":"28.81"}]}],[{"align":"center","data":[{"name":"text","data":"EPINET(2018)"}]},{"align":"center","data":[{"name":"text","data":"2.93"}]},{"align":"center","data":[{"name":"text","data":"18.57"}]},{"align":"center","data":[{"name":"bold","data":[{"name":"text","data":"1.53"}]}]},{"align":"center","data":[{"name":"text","data":"18.66"}]},{"align":"center","data":[{"name":"text","data":"2.22"}]},{"align":"center","data":[{"name":"text","data":"3.22"}]},{"align":"center","data":[{"name":"text","data":"11.82"}]}],[{"align":"center","data":[{"name":"text","data":"LFATT(2020)"}]},{"align":"center","data":[{"name":"bold","data":[{"name":"text","data":"2.55"}]}]},{"align":"center","data":[{"name":"text","data":"2.84"}]},{"align":"center","data":[{"name":"text","data":"4.29"}]},{"align":"center","data":[{"name":"text","data":"18.97"}]},{"align":"center","data":[{"name":"text","data":"0.70"}]},{"align":"center","data":[{"name":"text","data":"2.34"}]},{"align":"center","data":[{"name":"text","data":"7.24"}]}],[{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"OUR"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"2.60"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"bold","data":[{"name":"text","data":"1.53"}]}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"1.88"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"bold","data":[{"name":"text","data":"13.87"}]}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"bold","data":[{"name":"text","data":"0.65"}]}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"bold","data":[{"name":"text","data":"2.08"}]}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"bold","data":[{"name":"text","data":"7.22"}]}]}]],"foot":[]}],"graphics":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395382&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395381&type=","width":"161.79997253","height":"36.22499847","fontsize":""}}},{"name":"table","data":{"id":"T3","caption":[{"lang":"zh","label":[{"name":"text","data":"表3"}],"title":[{"name":"text","data":"平均运算时间"}]},{"lang":"en","label":[{"name":"text","data":"Tab.3"}],"title":[{"name":"text","data":"Average run times"}]}],"note":[],"table":[{"head":[[{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"LF(2015)"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"LF_OCC(2016)"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"SPO(2016)"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"EPINET(2018)"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"LFATT(2020)"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"OUR"}]}]],"body":[[{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"Time"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"1009.25"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"519.90"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"2115.00"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"1.98"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"5.86"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"1.52"}]}]],"foot":[]}],"graphics":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395384&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395383&type=","width":"161.79998779","height":"10.35000610","fontsize":""},"cellunit":[{"lang":"en","data":[{"name":"text","data":"s"}]}]}}]},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"3"},{"name":"italic","data":[{"name":"text","data":"."}]},{"name":"text","data":"2"},{"name":"italic","data":[{"name":"text","data":"."}]},{"name":"text","data":"2 定性实验"}],"level":"3","id":"s3b2"}},{"name":"p","data":[{"name":"xref","data":{"text":"图7","type":"fig","rid":"F7","data":[{"name":"text","data":"图7"}]}},{"name":"text","data":"展示了一些算法(b)-(g)在不同场景下的坏像素图("},{"name":"italic","data":[{"name":"text","data":"ε"}]},{"name":"text","data":"=0.03)和视差图。在坏像素指标评估里,红色部分为代表预测图像与参考图像相比有明显偏差或失真的像素点。视差图则反映了场景中每个像素点的视差值或深度信息。实验所用场景分别为boxes、cotton、dino和sideboard。可以看出传统方法"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"7","type":"bibr","rid":"R7","data":[{"name":"text","data":"7"}]}},{"name":"text","data":"]["},{"name":"xref","data":{"text":"8","type":"bibr","rid":"R8","data":[{"name":"text","data":"8"}]}},{"name":"text","data":"]["},{"name":"xref","data":{"text":"10","type":"bibr","rid":"R10","data":[{"name":"text","data":"10"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"在处理简单场景时,最终的视差结果与深度学习方法相似,然而在复杂场景下所估计的视差图会出现较大误差。深度学习的方法在结果上明显优于传统方法,但Epinet"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"17","type":"bibr","rid":"R17","data":[{"name":"text","data":"17"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"将多流网络特征简单连接在一起进行处理,在复杂纹理场景下(boxes)存在一定误差。LFATT"},{"name":"sup","data":[{"name":"text","data":"["},{"name":"xref","data":{"text":"18","type":"bibr","rid":"R18","data":[{"name":"text","data":"18"}]}},{"name":"text","data":"]"}]},{"name":"text","data":"考虑子孔径所有特征并共享权重,但忽视了某些遮挡场景下的细节区域。因此,这些方法在处理遮挡、重复纹理区域和边缘会导致一定误差。相比之下,本文网络有效利用校正卷积来处理遮挡掩膜引导的特征信息,以提取遮挡区域的局部特征,结合3D空间注意力进行代价优化,最终得到精确的深度图。"}]},{"name":"fig","data":{"id":"F7","caption":[{"lang":"zh","label":[{"name":"text","data":"图7"}],"title":[{"name":"text","data":"不同算法误差图("},{"name":"inlineformula","data":[{"name":"math","data":{"math":"ε","graphicsData":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395386&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395385&type=","width":"1.26999998","height":"2.45533323","fontsize":""}}}]},{"name":"text","data":"=0.03)及视差图"}]},{"lang":"en","label":[{"name":"text","data":"Fig.7"}],"title":[{"name":"text","data":"Different algorithm BadPix ("},{"name":"italic","data":[{"name":"text","data":"ε"}]},{"name":"text","data":"=0.03) and disparity map"}]}],"subcaption":[],"note":[],"graphics":[{"print":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395412&type=","small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395387&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395413&type=","width":"101.59449768","height":"130.00000000","fontsize":""}]}}]},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"3"},{"name":"italic","data":[{"name":"text","data":"."}]},{"name":"text","data":"2"},{"name":"italic","data":[{"name":"text","data":"."}]},{"name":"text","data":"3 复杂场景下的客观质量"}],"level":"3","id":"s3b3"}},{"name":"p","data":[{"name":"xref","data":{"text":"图8","type":"fig","rid":"F8","data":[{"name":"text","data":"图8"}]}},{"name":"text","data":"具体展示了两个深度学习算法和本文算法(b)-(d)在Herb、Bicycle、Origami遮挡场景下深度图和细节放大图。在Herb和Bicycle场景下蓝色方框和黄色方框展示了在复杂纹理和多处遮挡区域的深度图,EPINET无法完全捕捉到盆栽的枝叶和车轮的铁丝,误差较大。LFATTNET则对遮挡区域具有一定鲁棒性,但丢失了部分细节特征。红色方框展示在阴影场景Origami下,EPINET和LFATTNET阴影区域深度估计误差较大。实验使用三组复杂场景下的光场图像进行了测试,与普通场景相比,复杂场景下的图像特征更难学习。本文方法尽可能关注那些重复纹理和遮挡细节区域,在对比实验中使用与之前训练阶段相同的训练模型来保证结果的准确性,在这三种挑战性场景下都实现了最优的效果。"}]},{"name":"fig","data":{"id":"F8","caption":[{"lang":"zh","label":[{"name":"text","data":"图8"}],"title":[{"name":"text","data":"遮挡场景下不同算法的视差图"}]},{"lang":"en","label":[{"name":"text","data":"Fig.8"}],"title":[{"name":"text","data":"Disparity map of different algorithms in occlusion scenes"}]}],"subcaption":[],"note":[],"graphics":[{"print":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395418&type=","small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395420&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395419&type=","width":"100.00000000","height":"83.60216522","fontsize":""}]}}]},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"3"},{"name":"italic","data":[{"name":"text","data":"."}]},{"name":"text","data":"2"},{"name":"italic","data":[{"name":"text","data":"."}]},{"name":"text","data":"4 消融实验"}],"level":"3","id":"s3b4"}},{"name":"p","data":[{"name":"text","data":"为了验证本文提出的校正卷积和空间注意力策略的有效性,具体设计了三个消融实验:(1)Baseline是没有校正卷积和空间注意力的基础网络。(2)Baseline+SC网络为包含遮挡掩膜和校正卷积网络。(3)Baseline+SC+SA是所提出的完整网络,包含校正卷积和空间注意力,其他实验设置保持不变。如"},{"name":"xref","data":{"text":"表4","type":"table","rid":"T4","data":[{"name":"text","data":"表4"}]}},{"name":"text","data":"所示,基于校正卷积的网络SC效果相比Baseline表现更好,可见基于校正卷积的特征提取和融合提升了Baseline所预测的深度图质量。SC+SA则进一步利用改进后的3d空间注意力消除冗余信息,在代价聚合中着重处理遮挡区域下的视差信息,表现出最佳的实验结果。"}]},{"name":"table","data":{"id":"T4","caption":[{"lang":"zh","label":[{"name":"text","data":"表 4"}],"title":[{"name":"text","data":"消融实验"}]},{"lang":"en","label":[{"name":"text","data":"Tab.4"}],"title":[{"name":"text","data":"Ablation experiment"}]}],"note":[],"table":[{"head":[[{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"model"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"BadPix(%)"}]},{"align":"center","style":"border-top:solid;border-bottom:solid;","data":[{"name":"text","data":"MSE(px)"}]}]],"body":[[{"align":"center","data":[{"name":"text","data":"a"}]},{"align":"center","data":[{"name":"text","data":"Baseline"}]},{"align":"center","data":[{"name":"text","data":"2.87"}]},{"align":"center","data":[{"name":"text","data":"1.047"}]}],[{"align":"center","data":[{"name":"text","data":"b"}]},{"align":"center","data":[{"name":"text","data":"Baseline+SC"}]},{"align":"center","data":[{"name":"text","data":"2.74"}]},{"align":"center","data":[{"name":"text","data":"0.960"}]}],[{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"c"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"Baseline+SC+SA"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"2.02"}]},{"align":"center","style":"border-bottom:solid;","data":[{"name":"text","data":"0.853"}]}]],"foot":[]}],"graphics":{"small":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395422&type=","big":"https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=49395421&type=","width":"76.89999390","height":"20.70000839","fontsize":""}}}]}]}]},{"name":"sec","data":[{"name":"sectitle","data":{"title":[{"name":"text","data":"4 结论"}],"level":"1","id":"s4"}},{"name":"p","data":[{"name":"text","data":"本文提出了一种基于校正卷积网络进行光场深度估计的网络结构,该结构利用全阵列子孔径图像进行输入,通过初始视差图生成遮挡掩膜,并利用校正卷积判别编码以感知遮挡。然后提取所有视角下的深层细节信息并与多尺度特征融合,增强细节特征。此外,将全部视图特征构建为亚像素代价体,从而学习更精细的视差分布用于估计高精度的深度图。最后,亚像素代价体结合改进的3D空间注意力进行代价聚合,过滤冗余信息并增强关键特征信息。消融实验证明了校正卷积和空间注意力结构对深度估计的有效性。通过定量和定性分析,该方法在HCI 4D光场基准测试和复杂场景中实现了较为先进的性能。"}]}]}],"footnote":[],"reflist":{"title":[{"name":"text","data":"参考文献"}],"data":[{"id":"R1","label":"1","citation":[{"lang":"en","text":[{"name":"text","data":"Ren N"},{"name":"text","data":", "},{"name":"text","data":"Levoy M"},{"name":"text","data":", "},{"name":"text","data":"Bredif M"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"Light Field Photography with a Hand-Held Plenoptic Camera"},{"name":"text","data":"[J]. "},{"name":"text","data":"Stanford University Cstr"},{"name":"text","data":", "},{"name":"text","data":"2005"},{"name":"text","data":"."}],"title":"Light Field Photography with a Hand-Held Plenoptic Camera"}]},{"id":"R2","label":"2","citation":[{"lang":"en","text":[{"name":"text","data":"Perra C"},{"name":"text","data":", "},{"name":"text","data":"Murgia F"},{"name":"text","data":", "},{"name":"text","data":"Giusto D"},{"name":"text","data":". "},{"name":"text","data":"An analysis of 3D point cloud reconstruction from light field images"},{"name":"text","data":"[C]//"},{"name":"text","data":"2016 Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA)"},{"name":"text","data":". "},{"name":"text","data":"IEEE"},{"name":"text","data":", "},{"name":"text","data":"2016"},{"name":"text","data":": "},{"name":"text","data":"1"},{"name":"text","data":"-"},{"name":"text","data":"6"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1109/ipta.2016.7821011"}],"href":"http://dx.doi.org/10.1109/ipta.2016.7821011"}}],"title":"An analysis of 3D point cloud reconstruction from light field images"}]},{"id":"R3","label":"3","citation":[{"lang":"zh","text":[{"name":"text","data":"蔡晓峰"},{"name":"text","data":","},{"name":"text","data":"宋恭渝"},{"name":"text","data":","},{"name":"text","data":"杨鑫"},{"name":"text","data":","},{"name":"text","data":"等"},{"name":"text","data":"."},{"name":"text","data":"紧凑型纯相位全息近眼三维显示"},{"name":"text","data":"[J]. "},{"name":"text","data":"光学学报"},{"name":"text","data":", "},{"name":"text","data":"2023"},{"name":"text","data":", "},{"name":"text","data":"43"},{"name":"text","data":"("},{"name":"text","data":"5"},{"name":"text","data":"): "},{"name":"text","data":"0509002"},{"name":"text","data":"."}],"title":"紧凑型纯相位全息近眼三维显示"},{"lang":"en","text":[{"name":"text","data":"CAI Xiao-Feng"},{"name":"text","data":", "},{"name":"text","data":"Song Gong-yu"},{"name":"text","data":", "},{"name":"text","data":"Yang Xin"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"Compact Phase-Only Holographic Near-Eye 3D Display"},{"name":"text","data":"[J]. "},{"name":"text","data":"Acta Optica Sinica"},{"name":"text","data":", "},{"name":"text","data":"2023"},{"name":"text","data":", "},{"name":"text","data":"43"},{"name":"text","data":"("},{"name":"text","data":"5"},{"name":"text","data":"): "},{"name":"text","data":"0509002"},{"name":"text","data":"."}],"title":"Compact Phase-Only Holographic Near-Eye 3D Display"}]},{"id":"R4","label":"4","citation":[{"lang":"en","text":[{"name":"text","data":"Ninghe L"},{"name":"text","data":", "},{"name":"text","data":"Zhengzhong H"},{"name":"text","data":", "},{"name":"text","data":"Zehao H"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"DGE-CNN: 2D-to-3D holographic display based on a depth gradient extracting module and ZCNN network"},{"name":"text","data":".[J]. "},{"name":"text","data":"Optics express"},{"name":"text","data":","},{"name":"text","data":"2023"},{"name":"text","data":","},{"name":"text","data":"31"},{"name":"text","data":"("},{"name":"text","data":"15"},{"name":"text","data":"). "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1364/oe.489639"}],"href":"http://dx.doi.org/10.1364/oe.489639"}}],"title":"DGE-CNN: 2D-to-3D holographic display based on a depth gradient extracting module and ZCNN network"}]},{"id":"R5","label":"5","citation":[{"lang":"en","text":[{"name":"text","data":"Guo X"},{"name":"text","data":", "},{"name":"text","data":"Yang K"},{"name":"text","data":", "},{"name":"text","data":"Yang W"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"Group-wise correlation stereo network"},{"name":"text","data":"[C]//"},{"name":"text","data":"Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition"},{"name":"text","data":". "},{"name":"text","data":"2019"},{"name":"text","data":": "},{"name":"text","data":"3273"},{"name":"text","data":"-"},{"name":"text","data":"3282"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1109/cvpr.2019.00339"}],"href":"http://dx.doi.org/10.1109/cvpr.2019.00339"}}],"title":"Group-wise correlation stereo network"}]},{"id":"R6","label":"6","citation":[{"lang":"en","text":[{"name":"text","data":"Wanner S"},{"name":"text","data":", "},{"name":"text","data":"Goldluecke B"},{"name":"text","data":". "},{"name":"text","data":"Globally consistent depth labeling of 4D light fields"},{"name":"text","data":"[C]//"},{"name":"text","data":"2012 IEEE Conference on Computer Vision and Pattern Recognition"},{"name":"text","data":". "},{"name":"text","data":"IEEE"},{"name":"text","data":", "},{"name":"text","data":"2012"},{"name":"text","data":": "},{"name":"text","data":"41"},{"name":"text","data":"-"},{"name":"text","data":"48"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1109/cvpr.2012.6247656"}],"href":"http://dx.doi.org/10.1109/cvpr.2012.6247656"}}],"title":"Globally consistent depth labeling of 4D light fields"}]},{"id":"R7","label":"7","citation":[{"lang":"en","text":[{"name":"text","data":"Zhang S"},{"name":"text","data":", "},{"name":"text","data":"Hao S"},{"name":"text","data":", "},{"name":"text","data":"Chao L"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"Robust depth estimation for light field via spinning parallelogram operator"},{"name":"text","data":"[J]. "},{"name":"text","data":"Computer Vision & Image Understanding"},{"name":"text","data":", "},{"name":"text","data":"2016"},{"name":"text","data":", 145(apr.):"},{"name":"text","data":"148"},{"name":"text","data":"-"},{"name":"text","data":"159"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1016/j.cviu.2015.12.007"}],"href":"http://dx.doi.org/10.1016/j.cviu.2015.12.007"}}],"title":"Robust depth estimation for light field via spinning parallelogram operator"}]},{"id":"R8","label":"8","citation":[{"lang":"en","text":[{"name":"text","data":"Jeon H G"},{"name":"text","data":", "},{"name":"text","data":"Park J"},{"name":"text","data":", "},{"name":"text","data":"Choe G"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"Accurate depth map estimation from a lenslet light field camera"},{"name":"text","data":"[C]//"},{"name":"text","data":"Computer Vision & Pattern Recognition"},{"name":"text","data":". "},{"name":"text","data":"IEEE"},{"name":"text","data":", "},{"name":"text","data":"2015"},{"name":"text","data":":"},{"name":"text","data":"1547"},{"name":"text","data":"-"},{"name":"text","data":"1555"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1109/cvpr.2015.7298762"}],"href":"http://dx.doi.org/10.1109/cvpr.2015.7298762"}}],"title":"Accurate depth map estimation from a lenslet light field camera"}]},{"id":"R9","label":"9","citation":[{"lang":"en","text":[{"name":"text","data":"Tao M W"},{"name":"text","data":", "},{"name":"text","data":"Hadap S"},{"name":"text","data":", "},{"name":"text","data":"Malik J"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"Depth from combining defocus and correspondence using light-field cameras"},{"name":"text","data":"[C]//"},{"name":"text","data":"Proceedings of the IEEE International Conference on Computer Vision"},{"name":"text","data":". "},{"name":"text","data":"2013"},{"name":"text","data":": "},{"name":"text","data":"673"},{"name":"text","data":"-"},{"name":"text","data":"680"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1109/iccv.2013.89"}],"href":"http://dx.doi.org/10.1109/iccv.2013.89"}}],"title":"Depth from combining defocus and correspondence using light-field cameras"}]},{"id":"R10","label":"10","citation":[{"lang":"en","text":[{"name":"text","data":"Wang T C"},{"name":"text","data":", "},{"name":"text","data":"Efros A A"},{"name":"text","data":", "},{"name":"text","data":"Ramamoorthi R"},{"name":"text","data":". "},{"name":"text","data":"Occlusion-aware depth estimation using light-field cameras"},{"name":"text","data":"[C]//"},{"name":"text","data":"Proceedings of the IEEE international conference on computer vision"},{"name":"text","data":". "},{"name":"text","data":"2015"},{"name":"text","data":": "},{"name":"text","data":"3487"},{"name":"text","data":"-"},{"name":"text","data":"3495"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1109/iccv.2015.398"}],"href":"http://dx.doi.org/10.1109/iccv.2015.398"}}],"title":"Occlusion-aware depth estimation using light-field cameras"}]},{"id":"R11","label":"11","citation":[{"lang":"en","text":[{"name":"text","data":"Williem W"},{"name":"text","data":", "},{"name":"text","data":"Park I K"},{"name":"text","data":". "},{"name":"text","data":"Robust Light Field Depth Estimation for Noisy Scene with Occlusion"},{"name":"text","data":"[C]//"},{"name":"text","data":"Computer Vision & Pattern Recognition"},{"name":"text","data":". "},{"name":"text","data":"IEEE"},{"name":"text","data":", "},{"name":"text","data":"2016"},{"name":"text","data":":"},{"name":"text","data":"4396"},{"name":"text","data":"-"},{"name":"text","data":"4404"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1109/cvpr.2016.476"}],"href":"http://dx.doi.org/10.1109/cvpr.2016.476"}}],"title":"Robust Light Field Depth Estimation for Noisy Scene with Occlusion"}]},{"id":"R12","label":"12","citation":[{"lang":"en","text":[{"name":"text","data":"Luo Y"},{"name":"text","data":", "},{"name":"text","data":"Zhao Y"},{"name":"text","data":", "},{"name":"text","data":"Li J"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"Computational imaging without a computer: seeing through random diffusers at the speed of light"},{"name":"text","data":"[J]. "},{"name":"text","data":"eLight"},{"name":"text","data":", "},{"name":"text","data":"2022"},{"name":"text","data":", "},{"name":"text","data":"2"},{"name":"text","data":"("},{"name":"text","data":"1"},{"name":"text","data":"): "},{"name":"text","data":"4"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1186/s43593-022-00012-4"}],"href":"http://dx.doi.org/10.1186/s43593-022-00012-4"}}],"title":"Computational imaging without a computer: seeing through random diffusers at the speed of light"}]},{"id":"R13","label":"13","citation":[{"lang":"en","text":[{"name":"text","data":"Zuo C"},{"name":"text","data":", "},{"name":"text","data":"Qian J"},{"name":"text","data":", "},{"name":"text","data":"Feng S"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"Deep learning in optical metrology: a review"},{"name":"text","data":"[J]. "},{"name":"text","data":"Light: Science & Applications"},{"name":"text","data":", "},{"name":"text","data":"2022"},{"name":"text","data":", "},{"name":"text","data":"11"},{"name":"text","data":"("},{"name":"text","data":"1"},{"name":"text","data":"): "},{"name":"text","data":"39"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1038/s41377-022-00714-x"}],"href":"http://dx.doi.org/10.1038/s41377-022-00714-x"}}],"title":"Deep learning in optical metrology: a review"}]},{"id":"R14","label":"14","citation":[{"lang":"en","text":[{"name":"text","data":"Situ G"},{"name":"text","data":". "},{"name":"text","data":"Deep holography"},{"name":"text","data":"[J]. "},{"name":"text","data":"Light: Advanced Manufacturing"},{"name":"text","data":", "},{"name":"text","data":"2022"},{"name":"text","data":", "},{"name":"text","data":"3"},{"name":"text","data":"("},{"name":"text","data":"2"},{"name":"text","data":"): "},{"name":"text","data":"278"},{"name":"text","data":"-"},{"name":"text","data":"300"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.37188/lam.2022.013"}],"href":"http://dx.doi.org/10.37188/lam.2022.013"}}],"title":"Deep holography"}]},{"id":"R15","label":"15","citation":[{"lang":"en","text":[{"name":"text","data":"Heber S"},{"name":"text","data":", "},{"name":"text","data":"Pock T"},{"name":"text","data":". "},{"name":"text","data":"Convolutional networks for shape from light field"},{"name":"text","data":"[C]//"},{"name":"text","data":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"name":"text","data":". "},{"name":"text","data":"2016"},{"name":"text","data":": "},{"name":"text","data":"3746"},{"name":"text","data":"-"},{"name":"text","data":"3754"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1109/cvpr.2016.407"}],"href":"http://dx.doi.org/10.1109/cvpr.2016.407"}}],"title":"Convolutional networks for shape from light field"}]},{"id":"R16","label":"16","citation":[{"lang":"en","text":[{"name":"text","data":"Heber S"},{"name":"text","data":", "},{"name":"text","data":"Yu W"},{"name":"text","data":", "},{"name":"text","data":"Pock T"},{"name":"text","data":". "},{"name":"text","data":"Neural epi-volume networks for shape from light field"},{"name":"text","data":"[C]//"},{"name":"text","data":"Proceedings of the IEEE international conference on computer vision"},{"name":"text","data":". "},{"name":"text","data":"2017"},{"name":"text","data":": "},{"name":"text","data":"2252"},{"name":"text","data":"-"},{"name":"text","data":"2260"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1109/iccv.2017.247"}],"href":"http://dx.doi.org/10.1109/iccv.2017.247"}}],"title":"Neural epi-volume networks for shape from light field"}]},{"id":"R17","label":"17","citation":[{"lang":"en","text":[{"name":"text","data":"Shin C"},{"name":"text","data":", "},{"name":"text","data":"Jeon H G"},{"name":"text","data":", "},{"name":"text","data":"Yoon Y"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images"},{"name":"text","data":"[C]//"},{"name":"text","data":"Proceedings of the IEEE conference on computer vision and pattern recognition"},{"name":"text","data":". "},{"name":"text","data":"2018"},{"name":"text","data":": "},{"name":"text","data":"4748"},{"name":"text","data":"-"},{"name":"text","data":"4757"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1109/cvpr.2018.00499"}],"href":"http://dx.doi.org/10.1109/cvpr.2018.00499"}}],"title":"Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images"}]},{"id":"R18","label":"18","citation":[{"lang":"en","text":[{"name":"text","data":"Tsai Y J"},{"name":"text","data":", "},{"name":"text","data":"Liu Y L"},{"name":"text","data":", "},{"name":"text","data":"Ouhyoung M"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"Attention-based view selection networks for light-field disparity estimation"},{"name":"text","data":"[C]//"},{"name":"text","data":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"name":"text","data":". "},{"name":"text","data":"2020"},{"name":"text","data":", "},{"name":"text","data":"34"},{"name":"text","data":"("},{"name":"text","data":"07"},{"name":"text","data":"): "},{"name":"text","data":"12095"},{"name":"text","data":"-"},{"name":"text","data":"12103"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1609/aaai.v34i07.6888"}],"href":"http://dx.doi.org/10.1609/aaai.v34i07.6888"}}],"title":"Attention-based view selection networks for light-field disparity estimation"}]},{"id":"R19","label":"19","citation":[{"lang":"en","text":[{"name":"text","data":"Chen J"},{"name":"text","data":", "},{"name":"text","data":"Zhang S"},{"name":"text","data":", "},{"name":"text","data":"Lin Y"},{"name":"text","data":". "},{"name":"text","data":"Attention-based multi-level fusion network for light field depth estimation"},{"name":"text","data":"[C]//"},{"name":"text","data":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"name":"text","data":". "},{"name":"text","data":"2021"},{"name":"text","data":", "},{"name":"text","data":"35"},{"name":"text","data":"("},{"name":"text","data":"2"},{"name":"text","data":"): "},{"name":"text","data":"1009"},{"name":"text","data":"-"},{"name":"text","data":"1017"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1609/aaai.v35i2.16185"}],"href":"http://dx.doi.org/10.1609/aaai.v35i2.16185"}}],"title":"Attention-based multi-level fusion network for light field depth estimation"}]},{"id":"R20","label":"20","citation":[{"lang":"en","text":[{"name":"text","data":"Honauer K"},{"name":"text","data":", "},{"name":"text","data":"Johannsen O"},{"name":"text","data":", "},{"name":"text","data":"Kondermann D"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"A dataset and evaluation methodology for depth estimation on 4D light fields"},{"name":"text","data":"[C]//"},{"name":"text","data":"Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision"},{"name":"text","data":", "},{"name":"text","data":"Taipei, Taiwan, November"},{"name":"text","data":" "},{"name":"text","data":"20"},{"name":"text","data":"-"},{"name":"text","data":"24"},{"name":"text","data":", "},{"name":"text","data":"2016"},{"name":"text","data":", "},{"name":"text","data":"Revised Selected Papers, Part III"},{"name":"text","data":" "},{"name":"text","data":"13"},{"name":"text","data":". Springer International Publishing, "},{"name":"text","data":"2017"},{"name":"text","data":": "},{"name":"text","data":"19"},{"name":"text","data":"-"},{"name":"text","data":"34"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1007/978-3-319-54187-7_2"}],"href":"http://dx.doi.org/10.1007/978-3-319-54187-7_2"}}],"title":"A dataset and evaluation methodology for depth estimation on 4D light fields"}]},{"id":"R21","label":"21","citation":[{"lang":"en","text":[{"name":"text","data":"Kingma D P"},{"name":"text","data":", "},{"name":"text","data":"Ba J"},{"name":"text","data":". "},{"name":"text","data":"Adam: A method for stochastic optimization"},{"name":"text","data":"[J]. "},{"name":"text","data":"arXiv preprint arXiv:"},{"name":"text","data":"1412.6980"},{"name":"text","data":", "},{"name":"text","data":"2014"},{"name":"text","data":"."}],"title":"Adam: A method for stochastic optimization"}]},{"id":"R22","label":"22","citation":[{"lang":"en","text":[{"name":"text","data":"Liu J J"},{"name":"text","data":", "},{"name":"text","data":"Hou Q"},{"name":"text","data":", "},{"name":"text","data":"Cheng M M"},{"name":"text","data":", "},{"name":"text","data":"et al"},{"name":"text","data":". "},{"name":"text","data":"Improving convolutional networks with self-calibrated convolutions"},{"name":"text","data":"[C]//"},{"name":"text","data":"Proceedings of the IEEE/CVF conference on computer vision and pattern recognition"},{"name":"text","data":". "},{"name":"text","data":"2020"},{"name":"text","data":": "},{"name":"text","data":"10096"},{"name":"text","data":"-"},{"name":"text","data":"10105"},{"name":"text","data":". "},{"name":"text","data":" doi: "},{"name":"extlink","data":{"text":[{"name":"text","data":"10.1109/cvpr42600.2020.01011"}],"href":"http://dx.doi.org/10.1109/cvpr42600.2020.01011"}}],"title":"Improving convolutional networks with self-calibrated convolutions"}]}]},"response":[],"contributions":[],"acknowledgements":[],"conflict":[],"supportedby":[],"articlemeta":{"doi":"10.37188/CJLCD.2023-0307","clc":[[{"name":"text","data":"TP394.1"}],[{"name":"text","data":"TH691.9"}]],"dc":[{"name":"text","data":"A"}],"publisherid":"1007-2780(XXXX)XX-0001-10","citeme":[],"fundinggroup":[{"lang":"zh","text":[{"name":"text","data":"国家自然科学基金资助项目(No.61702384,No.61502357)"}]},{"lang":"en","text":[{"name":"text","data":"Supported by National Natural Science Foundation of China(No.61702384,No.61502357)"}]}],"history":{"received":"2023-09-22","revised":"2023-10-29","epub":"2023-11-01","opub":"2023-11-01"}},"appendix":[],"banner":"43464485","type":"research-article","ethics":[],"backSec":[],"supplementary":[],"journalTitle":"液晶与显示","originalSource":[]}