基于改进主动学习和自训练的联合算法

吕佳; 傅屈寒

doi:10.12202/j.0476-0301.2021186

基于改进主动学习和自训练的联合算法

吕佳,
傅屈寒

A joint algorithm by combined improved active learning and self-training

LÜ Jia,
FU Quhan

摘要

摘要: 针对主动学习面向大型数据集人工标记成本过高和半监督自训练算法中存在误标记点影响的问题，提出了一种主动学习与半监督自训练交替迭代训练的联合算法．算法在训练过程中奇数轮次采用主动学习算法，偶数轮次采用自训练算法，通过2种算法的交替迭代训练以弥补彼此不足．自训练算法对无标记样本的预测减轻了主动学习标记样本的负担，同时主动学习标记易变成噪声的样本，减轻了自训练算法训练过程中对样本的标记错误．提出了一种基于密度峰值聚类和隶属度的改进主动学习算法：将初始无标记样本聚类成簇，根据隶属度差值在每个簇内选取部分样本做人工标记，获得可表达样本的整体结构的均衡样本．仿真试验表明：提出的联合算法在性能上要优于2种单一算法．对比常见的主动学习算法，改进后的主动学习算法分类性能得到显著提升，将其应用于联合算法中的效果更具优势．

Abstract: Aiming at the problem of high cost of manual labeling in large data sets and influence of mislabeled points in semi-supervised self-training algorithm, a joint algorithm of alternatively iterative training for active learning and semi-supervised self-training was proposed．In the training process, active learning algorithm was used for odd turns, self-training algorithm was used for even turns, alternatively iterative training of the two algorithms was used to make up for each other’s deficiency．The prediction of unlabeled samples by self-training algorithm alleviated the burden of active learning labeling samples．Samples labeled by active learning tended to become noisy, alleviating labeling errors in samples in the training process of self-training algorithm．An improved active learning algorithm based on density peaks clustering and membership degree was proposed also: the initial unlabeled samples were clustered, with some samples in each cluster selected for manual labeling according to difference of membership degree, to obtain balanced samples to embody the overall structure of samples．Performance of the proposed joint algorithm was found to be better than the two single algorithms．Compared with common active learning algorithms, classification performance of the improved active learning algorithm was significantly improved, and application in joint algorithm had more advantages．

HTML全文

参考文献(21)

施引文献

资源附件(0)