Résumé:
Chest diseases, particularly Pneumonia and Tuberculosis remain among world's leading causes of
morbidity and mortality, posing ongoing healthcare challenge, especially in resource-limited sources.
Therefore, early detection of these diseases is crucial to save human lives. Chest X-rays (CXR) images
represent the most common tool used for chest disease diagnosis due to its painless, fast acquisition, and
widespread availability. However, the interpretation of these images is often hindered by weak image
resolution, overlapping features, and shortage of experienced radiologists. These limitations emphasis the
need of automated diagnostic tools to support clinicians’ decision making.
The primary objective of this thesis is to develop deep learning-based systems for accurate detection of
Pneumonia and Tuberculosis using CXR images. The research is structured around three contributions
designed in a hierarchical manner. Each successive approach addresses specific limitations encountered in
the preceding one. First, a hybrid CNN-XGboost model is introduced to detect Pneumonia and distinguish
between viral and bacterial Pneumonia. The model showed promising results in binary classification and
reduced performance in multi-class classification due to its inability to capture long-range dependencies
and complex patterns.
To address this limitation, an ensemble model combining ResNet-50 and ViT-b16 (a Vision Transformer–
based model) was developed—first for Tuberculosis detection, and then for multi-class classification of
normal, Tuberculosis, and Pneumonia CXR images. The ensemble model leverages the strength of
Convolutional Neural Network and Vision transformer, showing high performance in both binary
classification and multi-class classification. Despite the strong performance of Vision Transformers in
analyzing CXR images, the high memory consumption caused by their quadratic complexity, hinders the
training process. Vision Mamba, a new deep learning architecture, was recently developed to deal with this
issue with their ability to reduce computational overhead, while maintaining high accuracy. Based on this
concept, a fine-tuned Vision Mamba model was designed for efficient Tuberculosis detection using CXR
images. The obtained results demonstrate that the Vision Mamba-based model significantly reduced
memory consumption, while achieving high accuracy.