An applied study of gradient boosting algorithms integrating video data for mutual fund return prediction
-
Abstract
Using roadshow video data from Chinese public mutual funds, a multimodal feature system incorporating textual semantics, linguistic structure, and vocal behaviors is constructed, on the basis of which a gradient boosting regression (GBR) model is employed to predict next-day fund returns. With the 2020 dataset of Chinese mutual funds, model parameters are optimized through cross-validation and grid search. Comparative analyses with support vector regression(SVR), random forest(RF), and Lasso regression show that the GBR model achieves significantly higher predictive accuracy. Interpretability analysis further indicates that linguistic and acoustic features such as the proportion of vague expressions, speaking rate, and pitch variation make prominent contributions to prediction performance. These findings confirm that language style and communication patterns contain meaningful behavioral signals that affect investor judgments and market responses, offering forward-looking informational value. These results extend the application of multimodal data in fund analysis and provide quantitative evidence to support fund managers in optimising video-based disclosures and investors in identifying non-financial signals.
-
-