Skip to main content

Advertisement

ADVERTISEMENT

Peer Review

Peer Reviewed

Original Research

VGG19 Demonstrates the Highest Accuracy Rate in a Nine-Class Wound Classification Task Among Various Deep Learning Networks: A Pilot Study

January 2024
1044-7946
Wounds. 2024;36(1):8-14. doi:10.25270/wnds/23066
© 2024 HMP Global. All Rights Reserved.
Any views and opinions expressed are those of the author(s) and/or participants and do not necessarily reflect the views, policy, or position of Wounds or HMP Global, their employees, and affiliates.

Abstract

Background. Current literature suggests relatively low accuracy of multi-class wound classification tasks using deep learning networks. Solutions are needed to address the increasing diagnostic burden of wounds on wound care professionals and to aid non-wound care professionals in wound management. Objective. To develop a reliable, accurate 9-class classification system to aid wound care professionals and perhaps eventually, patients and non-wound care professionals, in managing wounds. Methods. A total of 8173 training data images and 904 test data images were classified into 9 categories: operation wound, laceration, abrasion, skin defect, infected wound, necrosis, diabetic foot ulcer, chronic ulcer, and wound dehiscence. Six deep learning networks, based on VGG16, VGG19, EfficientNet-B0, EfficientNet-B5, RepVGG-A0, and RepVGG-B0, were established, trained, and tested on the same images. For each network the accuracy rate, defined as the sum of true positive and true negative values divided by the total number, was analyzed. Results. The overall accuracy varied from 74.0% to 82.4%. Of all the networks, VGG19 achieved the highest accuracy, at 82.4%. This result is comparable to those reported in previous studies. Conclusion. These findings indicate the potential for VGG19 to be the basis for a more comprehensive and detailed AI-based wound diagnostic system. Eventually, such systems also may aid patients and non-wound care professionals in diagnosing and treating wounds. 

Abbreviations

Abbreviation: AI, artificial intelligence.

Introduction

A wound is a condition in which a normal skin structure is violated to any degree. This skin condition is frequently encountered not only by physicians but also by all people in everyday life. Some seek the help of medical professionals, while others rely on self-treatment, which can result in unfavorable outcomes. 

From this perspective, wound evaluation is of utmost importance in establishing treatment and predicting outcomes. Traditionally, wound evaluation was performed by medical experts, specifically plastic surgeons who had received extensive training in the wound healing process and wound management. In Korea, particularly in tertiary care centers, nearly all wounds are subject to plastic surgery consultation. The need for wound assessment is increasing because of the increasing number of patients with diseases such as diabetes and pressure injuries.1

An automated AI-based classification system is needed not only for accurate diagnoses but also for plausible assumptions, because patients, and even physicians who are not wound care professionals, can misdiagnose a wound, leading to detrimental or irreversible clinical outcomes. Prompt and accurate wound assessment can benefit both patients and physicians. Furthermore, because the increasing incidence of wounds imposes a significant diagnostic burden on wound care specialists, a wound classification system can assist these professionals by enabling patient self-care (for simple wounds), or by helping non-wound care professionals decide whether patient referral to a specialist is necessary.

Given this context, there is a growing need for an automated and precise AI-based wound classification system that can be practically used by medical professionals as well as by primary care physicians and patients. Although there are numerous commercially accessible wound evaluation products, the majority of them have not undergone peer review or certification processes.2,3 Most of these products specialize in a single type of wound, such as diabetic foot ulcer, burn wound, pressure injury, or infected wound.2 

Deep learning technology has increased in popularity and has significantly affected people’s daily lives. In particular, convolutional neural networks have been shown to be effective for disease diagnosis, and their application in plastic surgery is gradually increasing. Several researchers have investigated convolutional neural networks use in various aspects such as cephalometry, interpretation of patient motivations, and facial recognition.4-7 Although several studies report on AI-based wound classification, they tend to show a relatively low precision rate in multi-class classification owing to the complexity of wound assessment. Results for up to a 6-class classification task have been reported.8 However, 6-class classification is insufficient to accurately reflect actual clinical situations.8 The criterion standard for wound classification involves a comprehensive history and physical examination, including a focused inspection of the wound by an experienced physician, which requires considerable time and effort.

The purpose of this study was to propose an unprecedented 9-category wound evaluation system capable of effectively managing real clinical situations. By utilizing a deep learning network (ie, VGG19), this system classifies digital images of wounds with certified accuracy. Not only does this alleviate the diagnostic burden on physicians, it also provides a reliable reference for the general population and non-wound care professionals. The authors hypothesized that a 9-class wound classification system can be developed with superior accuracy compared with previously reported systems.

Methods

Definition and data collection
From January 2003 through August 2021, photographs of patients with various types of wounds were collected from different settings, including the outpatient clinic, emergency department, wards, other departments, and the operating room. The inclusion criteria were as follows: photographs of patients with any type of wound taken with consistent camera settings, that is, International Standards Organization value of 200, lens opening of f/8.0, shutter speed of 1/160 second, and a focal length of 55 mm.

The exclusion criteria were as follows: photographs containing any identifiable features of an individual, images with low resolution (<300 dots per inch), out-of-focus photographs, photographs not encompassing the entire wound area, photographs with more than 2 wound sites, and photographs with areas of the wound obscured by an object.

The wounds were divided into 2 categories, acute and chronic, and then further subdivided into the following categories: operation wound, laceration, abrasion, skin defect (acute), infected wound, necrosis, diabetic foot ulcer, chronic ulcer, and wound dehiscence (chronic). As a result of this categorization, the study authors developed a 9-class classification system for wound assessment (Figure 1). 

Figure 1

Establishment of deep neural network and workflow
A total of 8173 training data images and 904 test data images taken at the Korea University Ansan Hospital, Ansan, Korea, were included (Table 1). The training and testing process was conducted by a specialized team from the Department of Computer Science and Engineering at Hanyang University in Ansan. Several types of deep learning networks were constructed, including VGG16, VGG19, EfficientNet-B0, EfficientNet-B5, RepVGG-A0, and RepVGG-B0.9-11 Each network has its own mechanism, which varies in depth and width. VGGs consist of 3 × 3 convolution filters and 16 to 19 weight layers.9 EfficientNets differ from VGG in terms of convolution filter and layer size, as well as in having a more complex architecture.10 RepVGGs have a structure similar to VGGs, but with the addition of some manipulations such as structural re-parameterization techniques.11 An example of the machine learning based on the VGG19 structure is shown in Figure 2. All the networks were trained and tested using the same process. The overall workflow is illustrated in Figure 3.

Table 1

Figure 2

Figure 3

A preliminary training phase was carried out before the actual training began, with the goal of setting up the wound classes. The authors of the current study provided feedback and confirmed the 9-class classification task until the networks demonstrated accuracies of at least 70%. Each wound class then was labeled by 2 plastic surgeons (H.J.Y., J.H.C.) and learned using the proposed neural networks (Table 1). Except for the wound photographs, the patient data and covariates were not presented to the physicians. In most instances, both plastic surgeons agreed on the classification. However, in cases in which there was disagreement, the 2 physicians were instructed to discuss their differing views and select the classification they deemed most likely. After that, the authors supplied the team with training images along with the corresponding labeling data. The networks were trained, after which the unlabeled 904 test images were introduced. The networks performed the image analysis and showed the probability and ranking of each class (answer) and then selected the class with the highest probability as a final result. 

The networks then were analyzed for accuracy. Accuracy and the top 3 accuracies were defined using a confusion matrix that analyzed the predicted class and the actual class.12 Accuracy was defined as the ratio of the number of correct predictions to the total number of predictions. Top 3 accuracy was defined as follows: if any 1 of the 3 most probable classes predicted by the model is correct, then it is counted as a success, and this count is divided by the total number of classes to calculate the top 3 accuracy. The primary end point of the study was whether the accuracy of the established multi-class wound classification system surpassed the highest accuracy reported in previous literature (82.48%, in a 6-class classification task8).
 

Results

Regarding overall top 1 accuracy and top 3 accuracy, respectively, VGG16 demonstrated 81.1% and 94.0%, VGG19 demonstrated 82.4% and 95.8%, EfficientNet-B0 demonstrated 78.2% and 92.3%, EfficientNet-B5 demonstrated 80.6% and 92.5%, RepVGG-A0 demonstrated 77.5% and 91.0%, and RepVGG-B0 demonstrated 74.0% and 88.2% (Table 2). 
The range of accuracy for each wound classification was as follows: 83.3% to 91.2% for operation wound, 73.5% to 87.8% for laceration, 80.3% to 85.5% for abrasion, 39.0% to 84.0% for skin defects, 57.1% to 70.5% for infected wounds, 71.8% to 88.1% for necrosis, 73.3% to 85.3% for diabetic foot ulcer, 69.2% to 77.5% for chronic ulcer, and 39.2% to 74.0% for wound dehiscence (Table 3). For the most accurate network, VGG19, the average accuracy for acute wounds was 87.9% and the average accuracy for chronic wounds was 76.4% (Figure 4). 

Table 2

Table 3

Figure 4

Discussion

Among the tested networks, VGG19 showed the highest accuracy. The reason for this result is uncertain, but it may be because the characteristics of the wound evaluation mechanism were well matched with the main features of the VGG-type networks, which have a simpler structure, a small filter size, and deep convolutional layers compared with other networks.9

The networks classified acute wounds more accurately than chronic wounds. This was expected, because classification of chronic wounds is challenging, even for experienced surgeons. Chronic wounds have similar and overlapping presentations, making the decision difficult for the networks, as reported previously.8 In the current study, it is possible that the data volume was still insufficient for a clear distinction between the classes.

Although overall accuracy was high, some categories, such as infected wound and wound dehiscence, showed markedly low accuracy rates. More than 2 presentations can be observed in a wound. For instance, infection can occur simultaneously in any type of wound. Wound dehiscence is naturally difficult to differentiate from lacerations and other chronic wounds due to the clinical similarities. For example, if there were sutures along the margins of a necrotic wound, the network labeled it “operation wound” with the highest probability and “necrosis” with the second highest probability (Figure 5). This explains why the accuracy rate was markedly higher when the top 3 accuracy criteria were applied. 

Figure 5

Previous studies largely focused on binary classification, which distinguished wounds as normal or abnormal.13-16 Some authors have attempted 3-class classification, which yielded mixed results; however, the majority of studies were limited to classification of certain types of wounds, such as burns or diabetic foot ulcers.17-20 Studies that include more than 3 classes are uncommon. One study evaluated a 6-class wound classification that aimed to classify diabetic, venous, arterial plus venous, pressure, surgical wounds, background, and normal skin and reported a highest accuracy range of 67.52% to 82.48%8 (Table 4).

Table 4

The authors of the current study believe that 6-class classification is not sufficient for practical situations. In the current study, almost all clinical diagnoses of wounds were included, and the 9-class classification system was the first such attempt, to the authors’ knowledge. Although the accuracy rate in this study did not surpass 82.48% as reported in a previous study,8 the current study arguably presents better results given that the classifiers had to classify 3 additional categories. 

The classification system selected for the current study may be subject of debate. It should be noted that the classification was determined solely for the authors’ convenience in data interpretation. The authors acknowledge that the categorization was largely empirical and was not entirely robust. The class number was refined through multiple preliminary studies to reduce the classifiers’ error rates. Furthermore, no patient information was included other than the photographs in this study. The focus was on examining accuracy based on photographs alone, and the inclusion of patient covariables could have altered the accuracy rate. However, while this study is a pilot project, the promising results showed the possibility for using a deep learning model to aid in wound management decisions for the general population and non-wound care professionals, which encouraged the authors to proceed with publication. 

Based on the results of the current study, future plans include reinforcement of the wound data to boost accuracy, the creation of applications for use by general practitioners to improve referral and by patients to improve self-care, providing aids for decisions on the emergent situation and timing for seeking wound care professionals, recommendation of dressing methods and materials, the establishment of adequate wound management guidelines based on automated classifications, and AI-based estimation of time to wound healing (Figure 6).

Figure 6

Limitations

This study has limitations. The wounds were photographed at a single institution under the same settings (eg, lighting, distance from the lens to the objects). It is possible that this system might demonstrate lower accuracy for wound photographs in different settings at different institutions. The selection of a network demonstrating the highest accuracy had not yet been standardized, especially for multi-class classifications. Because the networks exhibited different accuracy rates in every class, it is not recommended to rely on a single network in clinical practice. The low accuracy rates in some categories indicate the need for improvement. Creating a dependable network is highly dependent on achieving an even accuracy rate across classes. The data volume was insufficient for the 9-class classification system. Additionally, due to limited resources it was not possible to conduct a more extensive study comparing the performance of humans and deep learning networks. Further research is necessary, involving a larger data set from multiple hospitals. Comparing the performance of this AI classification with human counterparts and evaluating the effectiveness and accuracy of each using a deep learning network would also be beneficial. 

Conclusion

The AI-based 9-class wound classification system had an acceptable rate of predictability and reliability for general practice and patients; this rate was similar to the highest accuracy rate noted in previous reports on multi-class wound classifications. Although this is a pilot study and additional evidence is required, this research indicates the potential of an AI-based wound classification system to assist physicians and patients in general clinical practice and real-world wound classification, thereby improving diagnostic quality and treatment outcomes. 

Acknowledgments

Authors: Jun Won Lee, MD1; Hi-Jin You, MD, PhD2,3; Ji-Hwan Cha, MD2; Tae-Yul Lee, MD2,3; and Deok-Woo Kim, MD, PhD2,3

Acknowledgments: Ji Hoon Kim, Kyung Ri Park, and Young Shik Moon from the Department of Computer Science and Engineering at Hanyang University in Ansan, Korea, participated in the establishment of the program. 

Affiliations: 1Department of Plastic and Reconstructive Surgery, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Korea; 2Department of Plastic and Reconstructive Surgery, Korea University College of Medicine, Seoul, Korea; 3Institute of Advanced Regeneration and Reconstruction

Disclosure: This work was supported by a grant from the Korea University (K2110211). The authors disclose no conflicts of interest.

Correspondence: Hi-Jin You, MD, PhD; Department of Plastic and Reconstructive Surgery, Korea University Ansan Hospital, Korea University College of Medicine, 123 Jeokgeum-ro, Danwon-gu, Ansan 15355, Korea; hijinyou@gmail.com

Manuscript Accepted: December 15, 2023
 

How Do I Cite This?

Lee JW, You HJ, Cha JH, Lee TY, Kim DW. VGG19 demonstrates the highest accuracy rate in a nine-class wound classification task among various deep learning networks: a pilot study. Wounds. 2024;36(1):8-14. doi:10.25270/wnds/23066

References

1. Sen CK. Human wound and its burden: updated 2020 compendium of estimates. Adv Wound Care (New Rochelle). 2021;10(5):281-292. doi:10.1089/wound.2021.0026

2. Jones O, Murphy SH, Durrani AJ. Regulation and validation of smartphone applications in plastic surgery: it’s the wild west out there. Surgeon. 2021;19(6):e412-e422. doi:10.1016/j.surge.2020.12.005

3. Chan KS, Lo ZJ. Wound assessment, imaging and monitoring systems in diabetic foot ulcers: a systematic review. Int Wound J. 2020;17(6):1909-1923. doi:10.1111/iwj.13481

4. Kunz F, Stellzig-Eisenhauer A, Zeman F, Boldt J. Artificial intelligence in orthodontics: evaluation of a fully automated cephalometric analysis using a customized convolutional neural network. J Orofac Orthop. 2020;81(1):52-68. doi:10.1007/s00056-019-00203-8

5. Zuo KJ, Saun TJ, Forrest CR. Facial recognition technology: a primer for plastic surgeons. Plast Reconstr Surg. 2019;143(6):1298e-1306e. doi:10.1097/PRS.0000000000005673

6. Levites HA, Thomas AB, Levites JB, Zenn MR. The use of emotional artificial intelligence in plastic surgery. Plast Reconstr Surg. 2019;144(2):499-504. doi:10.1097/PRS.0000000000005873

7. Jokhio MS, Mahoto NA, Jokhio S, Jokhio MS. Detecting tweet-based sentiment polarity of plastic surgery treatment. Mehran Univ Res J Eng Technol. 2015;34(4):403-412.

8. Anisuzzaman DM, Patel Y, Rostami B, Niezgoda J, Gopalakrishnan S, Yu Z. Multi-modal wound classification using wound image and location by deep neural network. Sci Rep. 2022;12(1):20057. doi:10.1038/s41598-022-21813-0

9. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Arxiv e-Prints. 2014(1). doi:10.48550/arXiv.1409.1556

10. Tan M, Le QV. EfficientNet: Rethinking model scaling for convolutional neural networks. Arxiv e-Prints. 2019(1). doi:10.48550/arXiv.1905.11946

11. Ding X, Zhang X, Ma N, Han J, Ding G, Sun J. RepVGG: Making VGG-style ConvNets great again. Arxiv e-Prints. 2021(1). doi:10.1109/CVPR46437.2021.01352

12. Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett. 2006;27(8):861-874. doi:10.1016/j.patrec.2005.10.010

13. Liu TJ, Christian M, Chu YC, et al. A pressure ulcers assessment system for diagnosis and decision making using convolutional neural networks. J Formos Med Assoc. 2022;121(11):2227-2236. doi:10.1016/j.jfma.2022.04.010

14. Yogapriya J, Chandran V, Sumithra MG, Elakkiya B, Ebenezer AS, Dhas CSG. Automated detection of infection in diabetic foot ulcer images using convolutional neural network. J Healthc Eng. 2022;2022:2349849. doi:10.1155/2022/2349849

15. Hüsers J, Hafer G, Heggemann J, et al. Automatic classification of diabetic foot ulcer images – a transfer-learning approach to detect wound maceration. Stud Health Technol Inform. 2022;289:301-304. doi:10.3233/SHTI210919

16. Hopkins BS, Mazmudar A, Driscoll C, et al. Using artificial intelligence (AI) to predict postoperative surgical site infection: a retrospective cohort of 4046 posterior spinal fusions. Clin Neurol Neurosurg. 2020;192:105718. doi:10.1016/j.clineuro.2020.105718

17. Rostami B, Anisuzzaman DM, Wang C, Golpalakrishnan S, Niezgoda J, Yu Z. Multiclass wound image classification using an ensemble deep CNN-based classifier. Comput Biol Med. 2021;134:104536. doi:10.1016/j.compbiomed.2021.104536

18. Anisuzzaman DM, Patel Y, Niezgoda J, Gopalakrishnan S, Yu Z. Wound severity classification using deep neural network. Arxiv e-Prints. 2022. doi:10.48550/arXiv.2204.07942

19. Watanabe R, Shima K, Horiuchi T, Shimizu T, Mukaeda T, Shimatani K. A system for wound evaluation support using depth and image sensors. Annu Int Conf IEEE Eng Med Biol Soc. 2021;2021:3709-3712. doi:10.1109/EMBC46164.2021.9629922

20. Wang Y, Ke Z, He Z, et al. Real-time burn depth assessment using artificial networks: a large-scale, multicentre study. Burns. 2020;46(8):1829-1838. doi:10.1016/j.burns.2020.07.010

Advertisement

Advertisement

Advertisement