双月刊

ISSN 1006-9895

CN 11-1768/O4

机器学习在西北太平洋热带气旋生成前期大尺度环流场分型与识别中的应用
DOI:
作者:
作者单位:

1.中国科学院大气物理研究所云降水物理与强风暴重点实验室;2.中国科学院大学;3.贵州省山地环境气候研究所

作者简介:

通讯作者:

基金项目:

国家重点研发计划项目;国家自然科学基金


Application of Machine Learning in Clustering and Discriminant Analysis of Large-scale Circulation Patterns Favorable for Tropical Cyclogenesis over the Western North Pacific
Author:
Affiliation:

Key Laboratory of Cloud-Precipitation Physics and Severe Storms, Institute of Atmospheric Physics

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    基于1979-2020年6-11月的热带气旋最佳路径(IBTrACS)和欧洲中期天气预报中心的第五代再分析(ERA5)资料,本文根据以热带气旋(TC)生成位置为中心的850hPa水平风场特征,采用自组织映射网络(SOM)将西北太平洋TC生成前期的低层大尺度环流场分为5型:季风辐合型(MC)、季风涡旋型(MG)、强季风槽型(SMT)、弱季风槽型(WMT)及东风波型(EW)。MC型TC生成于副热带高压南侧辐合带中,占比最高;MG、SMT与WMT三型的TC生成受季风槽相关的气旋性切变或辐合区影响;EW型TC由东风波增幅发展生成,占比最小。在对历史资料分型的基础上,为选取合适的机器学习方法用于TC环流型的自动识别,本文还对比分析了支持向量机(SVM)、k近邻(KNN)及随机森林(RF)三种方法的识别效果,结果表明:SVM的准确率达0.965,对五类环流型识别的召回率和精确率均达到0.94以上,对样本不均衡问题不敏感,并且对样本量的敏感性分析显示其在有限样本量下即可充分学习各型的环流场特征,识别效果明显优于KNN和RF。

    Abstract:

    Based on the IBTrACS dataset and ERA5 850-hPa winds from July to November in 1979-2020, the low-level large-scale circulations associated with tropical cyclogenesis over the western North Pacific are clustered into five patterns using self-organizing map (SOM). The five patterns are named Monsoon Confluence (MC), Monsoon Gyre (MG), Strong Monsoon Trough (SMT), Weak Monsoon Trough (WMT) and Easterly Wave (EW), respectively. Tropical cyclones (TCs) in the MC pattern form in the confluence zone south of the subtropical high, occupying the largest proportion. Cyclogeneses in the MG, SMT and WMT patterns are affected by the cyclonic wind shear or the confluence zone related to the monsoon trough. The EW pattern with the smallest number of cases features an easterly wave directly evolving into a TC. To select an optimal machine learning method for automatic pattern identification for a given TC circulation, comparison is carried out among three discriminant analysis models: support vector machine (SVM), k-nearest neighbors and random forest. The results show that SVM reveals the best accuracy of 0.965 and the least sensitivity to imbalanced data with recall rate and precision exceeding 0.94 for each circulation pattern. Meanwhile, the sensitivity to dataset size indicates that using SVM model, characteristic signals can be most effectively captured from relatively limited training data.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-05-08
  • 最后修改日期:2022-08-15
  • 录用日期:2022-08-17
  • 在线发布日期:
  • 出版日期: