Assessing Meteorological Drivers of PM2.5 in a Tropical Coastal Industrial Zone: A Comparative Study of Linear and Interpretable Machine Learning Models in Perai, Penang

Authors

  • Hongzhi Lu School of Industrial Technology, Universiti Sains Malaysia, Gelugor, 11800, Malaysia
  • Hongxue Lu University of Malaya, Kuala Lumpur, 50603, Malaysia

DOI:

https://doi.org/10.53797/ajvah.v7i1.1.2026%20

Keywords:

PM2.5 Prediction, XGBoost, Multiple Linear Regression, SHAP, Aerosol Hygroscopic Growth, Tropical Micro-climate

Abstract

Fine particulate matter (PM2.5) prediction in tropical coastal industrial zones is complicated by continuous industrial emissions, localized precipitation, sea-breeze circulation, and humidity-related measurement effects. This study developed a comparative and interpretable framework for daily PM2.5 estimation in the Perai Heavy Industrial Zone, Penang, using 97 concurrent observations from 2025-2026. Four meteorological predictors--temperature, wind speed, pressure, and precipitation--were evaluated with Multiple Linear Regression (MLR), Support Vector Regression (SVR), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). MLR achieved the strongest baseline predictive performance, indicating that simple linear structure can remain competitive when sample size is limited and temporal leakage is controlled. XGBoost was retained for interpretation because it captured non-linear local interactions more effectively than the other ensemble alternative. SHapley Additive exPlanations (SHAP) identified temperature and precipitation as the dominant drivers. The positive precipitation-PM2.5 relationship suggests that hygroscopic aerosol growth and optical sensor response may partly offset the expected wet-scavenging effect in this setting. The findings show that localized, interpretable modelling can support air-quality warning, sensor calibration, and meteorology-sensitive emission management in tropical industrial regions.

Downloads

Download data is not yet available.

References

Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2623-2631). Association for Computing Machinery. https://doi.org/10.1145/3292500.3330701

Aman, N., Panyametheekul, S., Pawarmart, I., Sudhibrabha, S., & Manomaiphiboon, K. (2025). A visibility-based historical PM2.5 estimation for four decades (1981-2022) using machine learning in Thailand: Trends, meteorological normalization, and influencing factors using SHAP analysis. Aerosol and Air Quality Research, 25, Article 4. https://doi.org/10.1007/s44408-025-00007-z

Amil, N., Latif, M. T., Khan, M. F., & Mohamad, M. (2016). Seasonal variability of PM2.5 composition and sources in the Klang Valley urban-industrial environment. Atmospheric Chemistry and Physics, 16(8), 5357-5381. https://doi.org/10.5194/acp-16-5357-2016

Arampongsanuwat, S., & Meesad, P. (2011). Prediction of PM10 using support vector regression. In International Proceedings of Computer Science and Information Technology (Vol. 6, pp. 120-124). IACSIT Press. https://hero.epa.gov/reference/4244373

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Cai, J., Luo, J., Wang, S., & Yang, S. (2018). Feature selection in machine learning: A new perspective. Neurocomputing, 300, 70-79. https://doi.org/10.1016/j.neucom.2017.11.077

Castelli, M., Clemente, F. M., Popovič, A., Silva, S., & Vanneschi, L. (2020). A machine learning approach to predict air quality in California. Complexity, 2020, Article 8049504. https://doi.org/10.1155/2020/8049504

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785

Cohen, A. J., Brauer, M., Burnett, R., Anderson, H. R., Frostad, J., Estep, K., Balakrishnan, K., Brunekreef, B., Dandona, L., Dandona, R., Feigin, V., Freedman, G., Hubbell, B., Jobling, A., Kan, H., Knibbs, L., Liu, Y., Martin, R., Morawska, L., ... Forouzanfar, M. H. (2017). Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: An analysis of data from the Global Burden of Diseases Study 2015. The Lancet, 389(10082), 1907-1918. https://doi.org/10.1016/S0140-6736(17)30505-6

Di, Q., Wang, Y., Zanobetti, A., Wang, Y., Koutrakis, P., Choirat, C., Dominici, F., & Schwartz, J. D. (2017). Air pollution and mortality in the Medicare population. New England Journal of Medicine, 376(26), 2513-2522. https://doi.org/10.1056/NEJMoa1702747

He, Z., Guo, Q., Zhang, Z., Feng, G., Qiao, S., & Wang, Z. (2025). Forecasting daily ambient PM2.5 concentrations in Qingdao City using deep learning and hybrid interpretable models and analysis of driving factors using SHAP. Toxics, 14(1), 44. https://doi.org/10.3390/toxics14010044

Hu, X., Zhang, J., Xue, W., Zhou, L., Che, Y., & Han, T. (2022). Estimation of the near-surface ozone concentration with full spatiotemporal coverage across the Beijing-Tianjin-Hebei region based on extreme gradient boosting combined with a WRF-Chem model. Atmosphere, 13(4), 632. https://doi.org/10.3390/atmos13040632

Jung, C.-R., Hwang, B.-F., & Chen, W.-T. (2018). Incorporating long-term satellite-based aerosol optical depth, localized land use data, and meteorological variables to estimate ground-level PM2.5 concentrations in Taiwan from 2005 to 2015. Environmental Pollution, 237, 1000-1010. https://doi.org/10.1016/j.envpol.2017.11.016

Kalita, G., Kunchala, R. K., Fadnavis, S., & Kaskaoutis, D. G. (2020). Long term variability of carbonaceous aerosols over Southeast Asia via reanalysis: Association with changes in vegetation cover and biomass burning. Atmospheric Research, 245, 105064. https://doi.org/10.1016/j.atmosres.2020.105064

Lei, T. M. T., Siu, S. W. I., Monjardino, J., Mendes, L., & Ferreira, F. (2022). Using machine learning methods to forecast air quality: A case study in Macao. Atmosphere, 13(9), 1412. https://doi.org/10.3390/atmos13091412

Li, D., Liu, M., Han, H., & Wang, J. (2026). Nonlinear impacts of air pollutants and meteorological factors on PM2.5: An interpretable GT-iFormer model with SHAP analysis. Atmosphere, 17(3), 266. https://doi.org/10.3390/atmos17030266

Li, T., Shen, H., Zeng, C., & Yuan, Q. (2020). A validation approach considering the uneven distribution of ground stations for satellite-based PM2.5 estimation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 1312-1321. https://doi.org/10.1109/JSTARS.2020.2977668

Liu, B.-C., Binaykia, A., Chang, P.-C., Tiwari, M. K., & Tsao, C.-C. (2017). Urban air quality forecasting based on multi-dimensional collaborative support vector regression (SVR): A case study of Beijing-Tianjin-Shijiazhuang. PLOS ONE, 12(7), e0179763. https://doi.org/10.1371/journal.pone.0179763

Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (Vol. 30). Curran Associates, Inc. https://papers.neurips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions

Ma, R., Ban, J., Wang, Q., Zhang, Y., Yang, Y., He, M. Z., Li, S., Shi, W., & Li, T. (2021). Random forest model based fine scale spatiotemporal O3 trends in the Beijing-Tianjin-Hebei region in China, 2010 to 2017. Environmental Pollution, 276, 116635. https://doi.org/10.1016/j.envpol.2021.116635

Ning, G., Wang, S., Yim, S. H. L., Li, J., Hu, Y., Shang, Z., Wang, J., & Wang, J. (2018). Impact of low-pressure systems on winter heavy air pollution in the northwest Sichuan Basin, China. Atmospheric Chemistry and Physics, 18(18), 13601-13615. https://doi.org/10.5194/acp-18-13601-2018

Pani, S. K., Lin, N.-H., Griffith, S. M., Chantara, S., Lee, C.-T., Thepnuan, D., & Tsai, Y. I. (2021). Brown carbon light absorption over an urban environment in northern peninsular Southeast Asia. Environmental Pollution, 276, 116735. https://doi.org/10.1016/j.envpol.2021.116735

Song, Y., Zhang, C., Jin, X., Zhao, X., Huang, W., Sun, X., Yang, Z., & Wang, S. (2023). Spatial prediction of PM2.5 concentration using hyper-parameter optimization XGBoost model in China. Environmental Technology & Innovation, 32, 103272. https://doi.org/10.1016/j.eti.2023.103272

Strobl, C., Boulesteix, A.-L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8, 25. https://doi.org/10.1186/1471-2105-8-25

Tangang, F. T., Juneng, L., Salimun, E., Sei, K. M., Le, L. J., & Muhamad, H. (2012). Climate change and variability over Malaysia: Gaps in science and research information. Sains Malaysiana, 41(11), 1355-1366. https://www.ukm.my/jsm/english_journals/vol41num11_2012/vol41num11_2012pg1355-1366.html

Thaifa, H., Muhammad, M., ul-Saufie, A. Z., Abd Hadi, N. A., Sulong, N. A., & Prasasti, C. I. (2025). Enhancing short-term PM2.5 prediction in Shah Alam using wrapper feature selection and machine learning techniques. Israa University Journal for Applied Science, 8(2), 1-32. https://doi.org/10.52865/sfwd4660

World Health Organization. (2021). WHO global air quality guidelines: Particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide. https://www.who.int/publications/i/item/9789240034228

Zaman, N. A. F. K., Kanniah, K. D., Kaskaoutis, D. G., & Latif, M. T. (2021). Evaluation of machine learning models for estimating PM2.5 concentrations across Malaysia. Applied Sciences, 11(16), 7326. https://doi.org/10.3390/app11167326

Zhang, K., Yang, X., Cao, H., Thé, J., Tan, Z., & Yu, H. (2023). Multi-step forecast of PM2.5 and PM10 concentrations using convolutional neural network integrated with spatial-temporal attention and residual learning. Environment International, 171, 107691. https://doi.org/10.1016/j.envint.2022.107691

Zhang, N., Guan, Y., Jiang, Y., Zhang, X., Ding, D., & Wang, S. (2022). Regional demarcation of synergistic control for PM2.5 and ozone pollution in China based on long-term and massive data mining. Science of the Total Environment, 838, 155975. https://doi.org/10.1016/j.scitotenv.2022.155975

Zhao, N., Zhang, H., & Wang, G. (2025). Revealing the nonlinear responses of PM2.5 and O3 to VOC and NOx emissions from various sources in Shandong, China. Journal of Hazardous Materials, 489, 137655. https://doi.org/10.1016/j.jhazmat.2025.137655

Downloads

Published

2026-05-27

How to Cite

Lu, H. ., & Lu, H. . (2026). Assessing Meteorological Drivers of PM2.5 in a Tropical Coastal Industrial Zone: A Comparative Study of Linear and Interpretable Machine Learning Models in Perai, Penang. Asian Journal of Vocational Education And Humanities, 7(1), 1-8. https://doi.org/10.53797/ajvah.v7i1.1.2026