Assessing Meteorological Drivers of PM2.5 in a Tropical Coastal Industrial Zone: A Comparative Study of Linear and Interpretable Machine Learning Models in Perai, Penang
DOI:
https://doi.org/10.53797/ajvah.v7i1.1.2026%20Keywords:
PM2.5 Prediction, XGBoost, Multiple Linear Regression, SHAP, Aerosol Hygroscopic Growth, Tropical Micro-climateAbstract
Fine particulate matter (PM2.5) prediction in tropical coastal industrial zones is complicated by continuous industrial emissions, localized precipitation, sea-breeze circulation, and humidity-related measurement effects. This study developed a comparative and interpretable framework for daily PM2.5 estimation in the Perai Heavy Industrial Zone, Penang, using 97 concurrent observations from 2025-2026. Four meteorological predictors--temperature, wind speed, pressure, and precipitation--were evaluated with Multiple Linear Regression (MLR), Support Vector Regression (SVR), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). MLR achieved the strongest baseline predictive performance, indicating that simple linear structure can remain competitive when sample size is limited and temporal leakage is controlled. XGBoost was retained for interpretation because it captured non-linear local interactions more effectively than the other ensemble alternative. SHapley Additive exPlanations (SHAP) identified temperature and precipitation as the dominant drivers. The positive precipitation-PM2.5 relationship suggests that hygroscopic aerosol growth and optical sensor response may partly offset the expected wet-scavenging effect in this setting. The findings show that localized, interpretable modelling can support air-quality warning, sensor calibration, and meteorology-sensitive emission management in tropical industrial regions.
Downloads
References
Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2623-2631). Association for Computing Machinery. https://doi.org/10.1145/3292500.3330701
Aman, N., Panyametheekul, S., Pawarmart, I., Sudhibrabha, S., & Manomaiphiboon, K. (2025). A visibility-based historical PM2.5 estimation for four decades (1981-2022) using machine learning in Thailand: Trends, meteorological normalization, and influencing factors using SHAP analysis. Aerosol and Air Quality Research, 25, Article 4. https://doi.org/10.1007/s44408-025-00007-z
Amil, N., Latif, M. T., Khan, M. F., & Mohamad, M. (2016). Seasonal variability of PM2.5 composition and sources in the Klang Valley urban-industrial environment. Atmospheric Chemistry and Physics, 16(8), 5357-5381. https://doi.org/10.5194/acp-16-5357-2016
Arampongsanuwat, S., & Meesad, P. (2011). Prediction of PM10 using support vector regression. In International Proceedings of Computer Science and Information Technology (Vol. 6, pp. 120-124). IACSIT Press. https://hero.epa.gov/reference/4244373
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Cai, J., Luo, J., Wang, S., & Yang, S. (2018). Feature selection in machine learning: A new perspective. Neurocomputing, 300, 70-79. https://doi.org/10.1016/j.neucom.2017.11.077
Castelli, M., Clemente, F. M., Popovič, A., Silva, S., & Vanneschi, L. (2020). A machine learning approach to predict air quality in California. Complexity, 2020, Article 8049504. https://doi.org/10.1155/2020/8049504
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785
Cohen, A. J., Brauer, M., Burnett, R., Anderson, H. R., Frostad, J., Estep, K., Balakrishnan, K., Brunekreef, B., Dandona, L., Dandona, R., Feigin, V., Freedman, G., Hubbell, B., Jobling, A., Kan, H., Knibbs, L., Liu, Y., Martin, R., Morawska, L., ... Forouzanfar, M. H. (2017). Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: An analysis of data from the Global Burden of Diseases Study 2015. The Lancet, 389(10082), 1907-1918. https://doi.org/10.1016/S0140-6736(17)30505-6
Di, Q., Wang, Y., Zanobetti, A., Wang, Y., Koutrakis, P., Choirat, C., Dominici, F., & Schwartz, J. D. (2017). Air pollution and mortality in the Medicare population. New England Journal of Medicine, 376(26), 2513-2522. https://doi.org/10.1056/NEJMoa1702747
He, Z., Guo, Q., Zhang, Z., Feng, G., Qiao, S., & Wang, Z. (2025). Forecasting daily ambient PM2.5 concentrations in Qingdao City using deep learning and hybrid interpretable models and analysis of driving factors using SHAP. Toxics, 14(1), 44. https://doi.org/10.3390/toxics14010044
Hu, X., Zhang, J., Xue, W., Zhou, L., Che, Y., & Han, T. (2022). Estimation of the near-surface ozone concentration with full spatiotemporal coverage across the Beijing-Tianjin-Hebei region based on extreme gradient boosting combined with a WRF-Chem model. Atmosphere, 13(4), 632. https://doi.org/10.3390/atmos13040632
Jung, C.-R., Hwang, B.-F., & Chen, W.-T. (2018). Incorporating long-term satellite-based aerosol optical depth, localized land use data, and meteorological variables to estimate ground-level PM2.5 concentrations in Taiwan from 2005 to 2015. Environmental Pollution, 237, 1000-1010. https://doi.org/10.1016/j.envpol.2017.11.016
Kalita, G., Kunchala, R. K., Fadnavis, S., & Kaskaoutis, D. G. (2020). Long term variability of carbonaceous aerosols over Southeast Asia via reanalysis: Association with changes in vegetation cover and biomass burning. Atmospheric Research, 245, 105064. https://doi.org/10.1016/j.atmosres.2020.105064
Lei, T. M. T., Siu, S. W. I., Monjardino, J., Mendes, L., & Ferreira, F. (2022). Using machine learning methods to forecast air quality: A case study in Macao. Atmosphere, 13(9), 1412. https://doi.org/10.3390/atmos13091412
Li, D., Liu, M., Han, H., & Wang, J. (2026). Nonlinear impacts of air pollutants and meteorological factors on PM2.5: An interpretable GT-iFormer model with SHAP analysis. Atmosphere, 17(3), 266. https://doi.org/10.3390/atmos17030266
Li, T., Shen, H., Zeng, C., & Yuan, Q. (2020). A validation approach considering the uneven distribution of ground stations for satellite-based PM2.5 estimation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 1312-1321. https://doi.org/10.1109/JSTARS.2020.2977668
Liu, B.-C., Binaykia, A., Chang, P.-C., Tiwari, M. K., & Tsao, C.-C. (2017). Urban air quality forecasting based on multi-dimensional collaborative support vector regression (SVR): A case study of Beijing-Tianjin-Shijiazhuang. PLOS ONE, 12(7), e0179763. https://doi.org/10.1371/journal.pone.0179763
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (Vol. 30). Curran Associates, Inc. https://papers.neurips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions
Ma, R., Ban, J., Wang, Q., Zhang, Y., Yang, Y., He, M. Z., Li, S., Shi, W., & Li, T. (2021). Random forest model based fine scale spatiotemporal O3 trends in the Beijing-Tianjin-Hebei region in China, 2010 to 2017. Environmental Pollution, 276, 116635. https://doi.org/10.1016/j.envpol.2021.116635
Ning, G., Wang, S., Yim, S. H. L., Li, J., Hu, Y., Shang, Z., Wang, J., & Wang, J. (2018). Impact of low-pressure systems on winter heavy air pollution in the northwest Sichuan Basin, China. Atmospheric Chemistry and Physics, 18(18), 13601-13615. https://doi.org/10.5194/acp-18-13601-2018
Pani, S. K., Lin, N.-H., Griffith, S. M., Chantara, S., Lee, C.-T., Thepnuan, D., & Tsai, Y. I. (2021). Brown carbon light absorption over an urban environment in northern peninsular Southeast Asia. Environmental Pollution, 276, 116735. https://doi.org/10.1016/j.envpol.2021.116735
Song, Y., Zhang, C., Jin, X., Zhao, X., Huang, W., Sun, X., Yang, Z., & Wang, S. (2023). Spatial prediction of PM2.5 concentration using hyper-parameter optimization XGBoost model in China. Environmental Technology & Innovation, 32, 103272. https://doi.org/10.1016/j.eti.2023.103272
Strobl, C., Boulesteix, A.-L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8, 25. https://doi.org/10.1186/1471-2105-8-25
Tangang, F. T., Juneng, L., Salimun, E., Sei, K. M., Le, L. J., & Muhamad, H. (2012). Climate change and variability over Malaysia: Gaps in science and research information. Sains Malaysiana, 41(11), 1355-1366. https://www.ukm.my/jsm/english_journals/vol41num11_2012/vol41num11_2012pg1355-1366.html
Thaifa, H., Muhammad, M., ul-Saufie, A. Z., Abd Hadi, N. A., Sulong, N. A., & Prasasti, C. I. (2025). Enhancing short-term PM2.5 prediction in Shah Alam using wrapper feature selection and machine learning techniques. Israa University Journal for Applied Science, 8(2), 1-32. https://doi.org/10.52865/sfwd4660
World Health Organization. (2021). WHO global air quality guidelines: Particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide. https://www.who.int/publications/i/item/9789240034228
Zaman, N. A. F. K., Kanniah, K. D., Kaskaoutis, D. G., & Latif, M. T. (2021). Evaluation of machine learning models for estimating PM2.5 concentrations across Malaysia. Applied Sciences, 11(16), 7326. https://doi.org/10.3390/app11167326
Zhang, K., Yang, X., Cao, H., Thé, J., Tan, Z., & Yu, H. (2023). Multi-step forecast of PM2.5 and PM10 concentrations using convolutional neural network integrated with spatial-temporal attention and residual learning. Environment International, 171, 107691. https://doi.org/10.1016/j.envint.2022.107691
Zhang, N., Guan, Y., Jiang, Y., Zhang, X., Ding, D., & Wang, S. (2022). Regional demarcation of synergistic control for PM2.5 and ozone pollution in China based on long-term and massive data mining. Science of the Total Environment, 838, 155975. https://doi.org/10.1016/j.scitotenv.2022.155975
Zhao, N., Zhang, H., & Wang, G. (2025). Revealing the nonlinear responses of PM2.5 and O3 to VOC and NOx emissions from various sources in Shandong, China. Journal of Hazardous Materials, 489, 137655. https://doi.org/10.1016/j.jhazmat.2025.137655
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Hongzhi Lu, Hongxue Lu

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.