Contribution to historical documents classification

Boudraa, Merouane

Accueil de DSpace
→
03.Doctorat
→
3.Faculté des Science Exactes et des Sciences de la Nature et de la Vie
→
Voir le document

dc.contributor.author	Boudraa, Merouane
dc.date.accessioned	2025-10-05T12:31:08Z
dc.date.available	2025-10-05T12:31:08Z
dc.date.issued	2025-05-15
dc.identifier.uri	http//localhost:8080/jspui/handle/123456789/13276
dc.description.abstract	Historical documents play a crucial role in unraveling the mysteries of history. However, studying these documents is a complex, time-consuming, and resource-intensive task, requiring significant expertise. With the rise of digital humanities and the availability of digital versions of documents, automating tasks related to these documents has become a possibility. One of the key challenges in this domain is “the classification of historical documents”. Early studies relied on traditional machine learning techniques that required manual feature extraction, but these methods often fell short in terms of accuracy. The advent of deep learning has opened new avenues for improving classification results, with various deep learning architectures being explored in this field. Despite the potential of deep learning, further exploration and the development of more efficient techniques are needed. This realization led us to contribute to the literature with our study on historical document classification. Our research provides a comprehensive exploration of the topic from both historical and computational perspectives. We began by understanding the complexities of historical document classification and analyzing existing studies. This analysis allowed us to identify key challenges and difficulties, guiding the direction of our contribution. Our approach begins with precise preprocessing techniques, including denoising via the Non-Local Means algorithm and binarization using the Canny edge detector. After these initial steps, we move into feature detection by applying the Harris corner detector to identify keypoints within the manuscript. These keypoints are then clustered using the k- means algorithm, allowing us to extract meaningful patches of predefined dimensions. This process not only isolates significant visual features but also supports systematic data augmentation to enhance the depth and diversity of our dataset. In the final stage, we harness the capabilities of vision transformers—a powerful deep learning architecture— finetuned to our specific classification tasks. By integrating automatically extracted handcrafted features, the model evolves into a robust and highly effective classification system for historical document analysis. As an added layer of refinement, we deploy a majority vote mechanism on image patches, meticulously engineered to heighten system accuracy. These findings have been validated and published in leading journals and conference proceedings, providing a strong foundation for future research in historical document analysis.	en_US
dc.language.iso	en	en_US
dc.publisher	Université Echahid Cheikh Larbi-Tebessi -Tébessa	en_US
dc.subject	Historical Documents, Machine Learning, Deep Learning, Classification, Dating, Writer Identification, Script Classification, Convolutional Neural Networks, Vision Transformers, Features Extraction, Computer Vision, Pattern Recognition	en_US
dc.title	Contribution to historical documents classification	en_US
dc.type	Thesis	en_US