融合视频数据的梯度提升算法在基金收益率预测中的应用研究

An applied study of gradient boosting algorithms integrating video data for mutual fund return prediction

  • 摘要: 基于2020年中国公募基金路演视频数据,构建涵盖文本语义、语言结构及语音行为在内的多模态特征体系,并采用梯度提升回归(gradient boosting regression,GBR)模型对基金次日收益率进行预测.通过交叉验证和网格搜索优化模型参数,并与支持向量回归(support vector regression,SVR)、随机森林(random forest,RF)、拉索等方法进行对比.结果表明,GBR在预测精度上具有显著优势.借助可解释性分析,本文揭示了“模糊词汇占比”“语速”“音调变化”等关键特征对预测结果的主要贡献,印证了语言风格与表达方式在行为金融视角下对投资者判断与市场反应的潜在影响.本研究不仅拓展了多模态数据建模的应用场景,也为基金经理优化视频信息披露、投资者理解非财务信号提供了量化支持.

     

    Abstract: Using roadshow video data from Chinese public mutual funds, a multimodal feature system incorporating textual semantics, linguistic structure, and vocal behaviors is constructed, on the basis of which a gradient boosting regression (GBR) model is employed to predict next-day fund returns. With the 2020 dataset of Chinese mutual funds, model parameters are optimized through cross-validation and grid search. Comparative analyses with support vector regression(SVR), random forest(RF), and Lasso regression show that the GBR model achieves significantly higher predictive accuracy. Interpretability analysis further indicates that linguistic and acoustic features such as the proportion of vague expressions, speaking rate, and pitch variation make prominent contributions to prediction performance. These findings confirm that language style and communication patterns contain meaningful behavioral signals that affect investor judgments and market responses, offering forward-looking informational value. These results extend the application of multimodal data in fund analysis and provide quantitative evidence to support fund managers in optimising video-based disclosures and investors in identifying non-financial signals.

     

/

返回文章
返回