图1 基于储备池计算的小样本图像分类模型框架
Received:06 December 2022,
Revised:11 January 2023,
Published:05 October 2023
Scan QR Code
Cite this article
Aiming at the problems that current few-shot learning algorithms are prone to overfitting and insufficient generalization ability for cross-domain cases, and inspired by the property that reservoir computing (RC) does not depend on training to alleviate overfitting, a few-shot image classification method based on reservoir computing (RCFIC) is proposed. The whole method consists of a feature extraction module, a feature enhancement module and a classifier module. The feature enhancement module consists of a RC module and an attention mechanism based on the RC, which performs channel-level enhancement and pixel-level enhancement of the features of the feature extraction module, respectively. Meanwhile, the joint cosine classifier drives the network to learn feature distributions with high inter-class variance and low intra-class variance properties. Experimental results indicate that the algorithm achieves at least 1.07% higher classification accuracy than the existing methods in Cifar-FS, FC100 and Mini-ImageNet datasets, and outperforms the second-best method in cross-domain scenes from Mini-ImageNet to CUB-200 by at least 1.77%. Meanwhile, the ablation experiments verify the effectiveness of RCFIC. The proposed method has great generalization ability and can effectively alleviate the overfitting problem in few-shot image classification and solve the cross-domain problem to a certain extent.
近年来,深度学习已经广泛应用于各行各业[
小样本学习方法通常可以分为两类:基于数据增强的方法[
在解决小样本问题时需要关注两方面的问题:(1)更好地提取特征来指导分类;(2)缓解过拟合,提高模型泛化能力,如进行数据增强等操作。考虑到人脑是一种天然的小样本学习范式,引入类脑知识或许有助于走出小样本学习的困境,再结合过拟合问题,促使本文应用一种类脑模型——储备池计算(Reservoir Computing, RC)[
针对上述问题,本文提出一种基于储备池计算的小样本图像分类方法(Reservoir Computing Based Network for Few-shot Image Classification,RCFIC),将特征提取网络提取的特征输入特征增强模块(由储备池模块和基于储备池的注意力机制构成)分别进行通道级和像素级增强,然后进行特征融合得到增强特征。同时,在元学习阶段使用余弦相似度分类器,联合特征增强模块促使网络提取的特征分布具有高类间方差、低类内方差的特征,从而更好地指导分类。本文方法在公开常用的小样本图像分类数据集上的实验均达到了具有竞争力的分类精度,表明所提模型和方法具有较强的泛化能力,能够使网络学习更具判别性的特征,缓解过拟合问题,增强模型的性能。
由于小样本学习的任务都基于少量有标签数据(称为新类或目标数据域),而少量数据难以学习到真实的数据模式,容易遇到过拟合问题。因此,一般会引入一个含有丰富标注样本(类别与新类互斥)的辅助数据集(称为基类)以帮助模型学习先验知识,然后再利用这些先验知识以在目标数据域上获得更好的任务表现。
小样本学习通常以元任务的方式进行训练和评估,每个元任务都以N-way K-shot方法获得,即每个元任务都包括
(1) |
(2) |
其中:
模型在支持集上学习后在测试集新类中采样大量的元任务来获得这些任务的平均准确率,从而评估模型在小样本学习任务上的分类性能和泛化能力。
基于储备池计算的小样本学习模型框架如
图1 基于储备池计算的小样本图像分类模型框架
Fig.1 Framework of few-shot image classification model based on reservoir computing
本文使用两阶段训练策略,如
图2 基于储备池计算的训练方法流程图
Fig.2 Flowchart of the training method based on RC
第一阶段为模型预训练。将小样本数据集的训练集按照合适的比例划分为新的训练集和验证集,模型在新划分的数据集上以传统图像分类的方式进行训练,分类器使用线性分类器,最后得到预训练模型
第二阶段为基于模型微调的小样本图像分类阶段。将
随着卷积网络宽度和深度的增加,网络对图像信息的提取更加充分。但由于数据样本较少带来的过拟合问题,使得在小样本学习任务中网络不能随意加深加宽,因此小样本学习领域常使用ResNet-12和ResNet-18作为特征提取网络。本文也使用这两个小样本学习任务中常用的主干网络作为特征提取模块。
通过特征提取模块
(3) |
其中:
2.5.1 半全连接的储备池内部拓扑结构
储备池的强大性能源于其内部复杂的动力学特性,表现为储备池内部神经元之间的连接方式(连接矩阵
本文的
首先生成一个
(4) |
式中的下标代表元素在
为了储备池能够稳定运行,
(5) |
其中:
2.5.2 储备池模块
储备池模块主要由半全连接拓扑结构的储备池和残差模块组成,用来提取输入特征的重要通道信息,进行通道级特征增强。在特征输入储备池之前,需要用一个线性层
(6) |
其中:
储备池每个时刻的输出
(7) |
(8) |
其中,”
储备池后接一个残差模块,残差模块内含一个批归一化层(Batch Normalization, BN)和前馈层(Feed-Forward, FF)以增加网络信息流通能力,防止网络退化。储备池通道级特征增强模块的输出
(9) |
2.5.3 基于储备池的注意力机制模块
在小样本学习领域,注意力机制常被用来整合特征信息。本文提出了一种新颖的基于储备池网络的注意力机制生成方式。该模块通过储备池生成新的特征图Q、K和V,然后根据
(10) |
与
(11) |
其中,
增强特征
(12) |
第一阶段使用线性分类器:
(13) |
其中:
第二阶段使用余弦分类器:
(14) |
其中,
余弦分类器中的
本文对所提方法和模型在Cifar-FS[
Cifar-FS和FC100均源自Cifar 100数据集。前者共包含100个类,每类有600张32
Mini-ImageNet由ImageNet[
CUB-200是细粒度图像数据集,共包含200种鸟类的11 788张84
实验配置为GTX2080Ti显卡、Linux操作系统、PyTorch深度学习框架。实验在小样本任务阶段通过5-way 1-shot和5-way 5-shot方式采样任务,最终准确率是1 500个元任务的平均分类精度。
3.2.1 小样本图像分类
首先在公开常用的小样本数据集上进行了图像分类实验,所提方法和目前先进的小样本学习方法的实验结果对比如
方法 | 骨干网络/Attn | Cifar-FS | FC100 | ||
---|---|---|---|---|---|
5-way 1-shot | 5-way 5-shot | 5-way 1-shot | 5-way 5-shot | ||
Cp.Nets[ | ResNet-12/No |
75.40 |
86.80 |
43.80 |
59.70 |
TPMN[ | ResNet-12/No |
75.50 |
87.20 |
46.93 |
63.26 |
RFS-distill[ | ResNet-12/No |
73.90 |
86.90 |
44.60 |
60.90 |
MetaOptNet[ | ResNet-12/No |
72.60 |
84.30 |
41.10 |
55.50 |
MetaQAD[ | WRN-28-10/No |
75.83 |
88.79 | - | - |
Centroid[ | ResNet-18/No | - | - |
45.83 |
59.74 |
STANet[ | ResNet-12/Yes |
74.89 |
88.23 |
46.27 |
62.89 |
Main[ | ResNet-12/Yes |
74.36 |
84.13 |
44.54 |
58.09 |
Cro-Attention[ | ResNet-12/Yes |
75.33 |
87.94 |
45.78 |
62.78 |
RCFIC | ResNet-12 |
77.23 |
88.91 |
48.14 |
64.27 |
RCFIC | ResNet-18 |
79.44 |
89.86 |
50.49 |
66.52 |
注: Attn表示是否使用了注意力机制;*表示复现结果
方法 | 骨干网络/Attn | 5-way 1-shot | 5-way 5-shot |
---|---|---|---|
DMF[ | ResNet-12/No |
67.76 |
82.71 |
IEPT[ | ResNet-12/No |
67.05 |
82.90 |
CTM[ | ResNet-18/No |
64.12 |
80.51 |
S2M2[ | ResNet-18/No |
64.06 |
80.58 |
STANet[ | ResNet-12/Yes |
58.35 |
71.07 |
Main[ | ResNet-12/Yes |
64.27 |
81.24 |
Cro-Attention[ | ResNet-12/Yes |
67.19 |
80.64 |
RCFIC | ResNet-12 |
67.95 |
83.15 |
RCFIC | ResNet-18 |
69.87 |
84.45 |
注: Attn表示是否使用了注意力机制
在Cifar-FS数据集上,5-way 1-shot和5-way 5-shot设置下的最优精度均是在以ResNet-18为特征提取网络时取得,分别为79.44%和89.86%,分别比次优网络MetaQAD高3.61%和1.07%。
在FC100数据集上,5-way 1-shot和5-way 5-shot设置下的最优精度均是在以ResNet-18为特征提取网路时取得,分别为50.49%和66.52%,分别比次优网络TPMN高3.56%和3.26%。
在Mini-ImageNet数据集上,在5-way 1-shot设置下,所提方法在ResNet-18特征提取网络下的分类准确率达到了69.87%,比次优方法DMF提高了2.11%;5-way 5-shot设置下的最高精度为84.45%,比次优方法IEPT提高了1.55%。
同时,所提方法在3个数据集上的分类精度比其他基于注意力机制的小样本图像分类方法高约2%。
实验结果说明所提方法能够有效对特征进行增强以提高分类准确率,能够有效处理小样本图像分类任务。
3.2.2 领域迁移
现实世界中基类和新类的数据模式差距一般都比较大,使得更加符合真实场景的领域迁移场景成为小样本学习领域的研究重点之一。领域迁移问题要求模型具有良好的泛化能力。为了验证所提方法的泛化性,本文设置了此类领域转移的场景:实验使用ResNet-12和ResNet-18作为特征提取的骨干网络,先在粗粒度数据集Mini-ImageNet上训练模型,然后再在细粒度数据集CUB-200上测试模型。
实验结果如
方法 | 骨干网络 | 5-way 1-shot | 5-way 5-shot |
---|---|---|---|
LFWT[ | ResNet-10 |
47.47 |
66.98 |
LRP[ | ResNet-12 |
46.23 |
66.58 |
S-Shot[ | ResNet-18 |
46.68 |
65.56 |
RCFIC | ResNet-12 |
48.15 |
67.66 |
RCFIC | ResNet-18 |
49.24 |
69.07 |
注: Mini-ImageNet迁移到CUB-200
实验说明所提方法针对领域迁移问题有良好的表现,模型的泛化能力强。
3.3.1 特征增强模块的影响
所提方法的特征增强模块由储存池模块和基于储存池的注意力机制模块组成。为了探究所提模块的必要性以及对结果产生的影响,以ResNet-18为特征提取网络在Cifar-FS数据集上进行了不使用特征增强模块(No Enhancement,NE)、只使用储备池模块(Only Reservoir,OR)和只使用基于储备池的注意力机制模块(Only Attention,OA)的消融实验。
实验结果如
NA | OR | OA | 5-way 1-shot | 5-way 5-shot |
---|---|---|---|---|
- | - | - |
73.43 |
84.34 |
- | √ | - |
76.61 |
87.33 |
- | - | √ |
78.66 |
87.62 |
- | √ | √ |
79.44 |
89.86 |
3.3.2 不同注意力机制生成方式的影响
为了说明所提方法相比于传统的线性变换或卷积操生成注意力机制的优势,在Mini-ImageNet数据集上以ResNet-18为特征提取网络进行了小样本图像分类实验。实验结果如
线性变换 | 卷积 | 储备池 | 5-way 1-shot | 5-way 5-shot |
---|---|---|---|---|
- | - | - |
60.97 |
79.23 |
√ | - | - |
65.27 |
82.33 |
- | √ | - |
63.75 |
81.62 |
- | - | √ |
69.87 |
84.45 |
3.3.3 特征分布可视化
在Cifar-FS数据集上,以ResNet-18为特征提取网络对查询集的特征进行提取(q=30,共5×30张查询图像)。以不同的注意力机制进行增强后,采用t-Distributed Stochastic Neighbor Embedding(t-SNE)[
如
图3 不同方式生成注意力机制对特征进行增强后的特征分布
Fig.3 Feature distributions after the enhancement by attention mechanisms generated in different ways
3.3.4 可学习标量参数的影响
可学习标量参数主要用来进行缩放,主要体现在
如
5-way 1-shot | 5-way 5-shot | ||
---|---|---|---|
- | - |
77.12 |
87.37 |
√ | √ |
79.44 |
89.86 |
图4 不同的
Fig.4 Effect of different initial values of
3.3.5 不同储备池内部拓扑结构的影响
储备池内部拓扑结构使其具有丰富的动力学特性来处理复杂的数据。为了直观说明所提拓扑结构的优势,在Mini-ImageNet数据集上以ResNet-18为特征提取网络进行了小样本图像分类实验。
实验结果如
拓扑结构 | 骨干网络 | 5-way 1-shot | 5-way 5-shot |
---|---|---|---|
Random | ResNet-18 |
67.16 |
80.57 |
Delay line | ResNet-18 |
65.33 |
78.95 |
Cyclic | ResNet-18 |
66.97 |
78.52 |
Wigner | ResNet-18 |
68.44 |
81.38 |
RCFIC | ResNet-18 |
69.87 |
84.45 |
本文提出了一种基于储备池计算的小样本图像分类方法,通过储备池模块和基于储备池模块的注意力机制对特征进行通道级和像素级增强,联合余弦分类器使得网络提取的特征分布具有高类间方差、低类内方差的特性。相较于目前流行的小样本图像分类方法,所提方法在标准的小样本图像分类任务和跨域转移场景下的分类精度至少分别高1.07%和1.77%,具有较强的泛化性。本文方法依赖于储备池内部动力学特性来缓解过拟合、增强模型泛化性能,然而其内在机制缺乏可解释性,这也将是下一步的研究重点。
LUO Y, ZHAO Y F, LI J X, et al. Computational imaging without a computer: seeing through random diffusers at the speed of light [J]. eLight, 2022, 2(1): 4. doi: 10.1186/s43593-022-00012-4 [Baidu Scholar]
ZUO C, QIAN J M, FENG S J, et al. Deep learning in optical metrology: a review [J]. Light: Science & Applications, 2022, 11(1): 39. doi: 10.1038/s41377-022-00714-x [Baidu Scholar]
SITU G. Deep holography [J]. Light: Advanced Manufacturing, 2022, 3(2): 8. doi: 10.37188/lam.2022.013 [Baidu Scholar]
CHEN C F R, FAN Q F, PANDA R. CrossViT: Cross-attention multi-scale vision transformer for image classification [C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 347-356. doi: 10.1109/iccv48922.2021.00041 [Baidu Scholar]
杜敏敏,司马海峰. A-LinkNet:注意力与空间信息融合的语义分割网络[J]. 液晶与显示,2022,37(9):1199-1208. doi: 10.37188/CJLCD.2022-0046 [Baidu Scholar]
DU M M, SIMA H F. A-LinkNet: semantic segmentation network based on attention and spatial information fusion [J]. Chinese Journal of Liquid Crystals and Displays, 2022, 37(9): 1199-1208. (in Chinese). doi: 10.37188/CJLCD.2022-0046 [Baidu Scholar]
WU X W, SAHOO D, HOI S C H. Recent advances in deep learning for object detection [J]. Neurocomputing, 2020, 396: 39-64. doi: 10.1016/j.neucom.2020.01.085 [Baidu Scholar]
ZHONG X, GU C, YE M, et al. Graph complemented latent representation for few-shot image classification [J]. IEEE Transactions on Multimedia, 2022, 25: 1979-1990. doi: 10.1109/tmm.2022.3141886 [Baidu Scholar]
FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks [C]//Proceedings of the 34th International Conference on Machine Learning. Sydney: JMLR.org, 2017: 1126-1135. doi: 10.1109/icra.2016.7487173 [Baidu Scholar]
LI F F, FERGUS R, PERONA P. One-shot learning of object categories [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 594-611. doi: 10.1109/tpami.2006.79 [Baidu Scholar]
ROYLE J A, DORAZIO R M, LINK W A. Analysis of multinomial models with unknown index using data augmentation [J]. Journal of Computational and Graphical Statistics, 2007, 16(1): 67-85. doi: 10.1198/106186007x181425 [Baidu Scholar]
CHEN W Y, LIU Y C, KIRA Z, et al. A closer look at few-shot classification [C]. 7th International Conference on Learning Representations. New Orleans: OpenReview.net, 2019. [Baidu Scholar]
LI X X, SUN Z, XUE J H, et al. A concise review of recent few-shot meta-learning methods [J]. Neurocomputing, 2021, 456: 463-468. doi: 10.1016/j.neucom.2020.05.114 [Baidu Scholar]
YAN S P, ZHANG S Y, HE X M. A dual attention network with semantic embedding for few-shot learning [C]. Thirty-Seventh AAAI Conference on Artificial Intelligence. Washington: AAAI Press, 2019: 9079-9086. doi: 10.1609/aaai.v33i01.33019079 [Baidu Scholar]
QIN Z L, WANG H, MAWULI C B, et al. Multi-instance attention network for few-shot learning [J]. Information Sciences, 2022, 611: 464-475. doi: 10.1016/j.ins.2022.07.013 [Baidu Scholar]
HOU R B, CHANG H, MA B P, et al. Cross attention network for few-shot classification [C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. [Baidu Scholar]
GUO Y H, CODELLA N C, KARLINSKY L, et al. A broader study of cross-domain few-shot learning [C].16th European Conference on Computer Vision. Glasgow: Springer, 2020: 124-141. doi: 10.1007/978-3-030-58583-9_8 [Baidu Scholar]
JAEGER H. Short term memory in echo state networks [R]. Forschungszentrum Informationstechnik GmbH, 2002. [Baidu Scholar]
MAASS W, NATSCHLÄGER T, MARKRAM H. Real-time computing without stable states: A new framework for neural computation based on perturbations [J]. Neural Computation, 2002, 14(11): 2531-2560. doi: 10.1162/089976602760407955 [Baidu Scholar]
VERZELLI P, ALIPPI C, LIVI L, et al. Input-to-state representation in linear reservoirs dynamics [J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(9): 4598-4609. doi: 10.1109/tnnls.2021.3059389 [Baidu Scholar]
BERTINETTO L, HENRIQUES J F, TORR P, et al. Meta-learning with differentiable closed-form solvers [C]. International Conference on Learning Representations. New Orleans: ICLR, 2019. [Baidu Scholar]
ORESHKIN B N, RODRÍGUEZ P, LACOSTE A. TADAM: Task dependent adaptive metric for improved few-shot learning [C]//Proceedings of the 32nd International Conference on Advances in Neural Information Processing Systems. Montréal: Curran Associates Inc., 2018. [Baidu Scholar]
VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning [C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona: Curran Associates Inc., 2016. [Baidu Scholar]
CUI Y, ZHOU F, LIN Y Q, et al. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop [C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 1153-1162. doi: 10.1109/cvpr.2016.130 [Baidu Scholar]
DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database [C]. 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009. doi: 10.1109/cvpr.2009.5206848 [Baidu Scholar]
XU W J, XU Y F, WANG H J, et al. Attentional constellation nets for few-shot learning [C]. 9th International Conference on Learning Representations. Virtual, Online: OpenReview.net, 2021. [Baidu Scholar]
WU J M, ZHANG T Z, ZHANG Y D, et al. Task-aware part mining network for few-shot learning [C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 8413-8422. doi: 10.1109/iccv48922.2021.00832 [Baidu Scholar]
TIAN Y L, WANG Y, KRISHNAN D, et al. Rethinking few-shot image classification: a good embedding is all you need? [C]//16th European Conference on Computer Vision. Glasgow: Springer, 2020: 266-282. doi: 10.1007/978-3-030-58568-6_16 [Baidu Scholar]
LIU Y B, LEE J, PARK M, et al. Transductive propagation network for few-shot learning [J/OL]. arXiv, 2018: 1805.10002v1. doi: 10.24963/ijcai.2020/112 [Baidu Scholar]
ZHANG X T, MENG D B, GOUK H, et al. Shallow Bayesian meta learning for real-world few-shot recognition [C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 631-640. doi: 10.1109/iccv48922.2021.00069 [Baidu Scholar]
AFRASIYABI A, LALONDE J F, GAGNÉ C. Associative alignment for few-shot image classification [C]. 16th European Conference on Computer Vision. Glasgow: Springer, 2020: 18-35. doi: 10.1007/978-3-030-58558-7_2 [Baidu Scholar]
XU C M, FU Y W, LIU C, et al. Learning dynamic alignment via meta-filter for few-shot learning [C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 5178-5187. doi: 10.1109/cvpr46437.2021.00514 [Baidu Scholar]
ZHANG M L, ZHANG J H, LU Z W, et al. IEPT: Instance-level and episode-level pretext tasks for few-shot learning [C]. 9th International Conference on Learning Representations. Vienna: OpenReview.net, 2021. [Baidu Scholar]
LI H Y, EIGEN D, DODGE S, et al. Finding task-relevant features for few-shot learning by category traversal [C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 1-10. doi: 10.1109/cvpr.2019.00009 [Baidu Scholar]
MANGLA P, SINGH M, SINHA A, et al. Charting the right manifold: Manifold Mixup for few-shot learning [C]//Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass: IEEE, 2020: 2207-2216. doi: 10.1109/wacv45572.2020.9093338 [Baidu Scholar]
TSENG H Y, LEE H Y, HUANG J B, et al. Cross-domain few-shot classification via learned feature-wise transformation [C]. 8th International Conference on Learning Representations. Addis Ababa: OpenReview.net, 2020. [Baidu Scholar]
SUN J M, LAPUSCHKIN S, SAMEK W, et al. Explanation-guided training for cross-domain few-shot classification [C]. 2020 25th International Conference on Pattern Recognition (ICPR). Milan: IEEE, 2021: 7609-7616. doi: 10.1109/icpr48806.2021.9412941 [Baidu Scholar]
WANG Y, CHAO W L, WEINBERGER K Q, et al. Revisiting nearest-neighbor classification for few-shot learning [J/OL]. arXiv, 2019:1911.04623v1. [Baidu Scholar]
VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE [J]. Journal of Machine Learning Research, 2008, 9(86): 2579-2605. [Baidu Scholar]
530
Views
287
Downloads
4
CSCD
Related Articles
Related Author
Related Institution