Hybrid Question-Answering System: A FAISS and BM25 Approach for Extracting Information from Technical Document
Abstract
In this study, a hybrid question-answering system was developed to accelerate access to information contained in corporate technical documents and to generate appropriate responses to user queries. The system combines dense vector-based retrieval (FAISS) and sparse text-based retrieval (BM25) methods, integrated with the XLM-RoBERTa Large model. Evaluations conducted on a dataset consisting of 23 technical documents demonstrated the system's effectiveness in responding to both semantic and keyword-based queries. This study presents an innovative approach that enables fast and accurate access to information from technical documents, enhancing the efficiency of corporate knowledge management processes.
References
- 1.C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge University Press, 2008.
- 2.A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is All You Need,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008.
- 3.J. Devlin, M.-W. Chang, K. Lee ve K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, ABD, Haz. 2019, ss. 4171–4186. [Çevrimiçi]. Erişim: https://aclanthology.org/N19-1423/Link
- 4.Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. Bang, A. Madotto ve P. Fung, "Survey of Hallucination in Natural Language Generation," ACM Computing Surveys, cilt 55, sayı 12, s. 1–38, Şub. 2022. [Çevrimiçi]. Erişim: https://arxiv.org/pdf/2202.03629v1Link
- 5.P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Kuttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel ve D. Kiela, "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," arXiv preprint arXiv:2005.11401, 2020. [Çevrimiçi]. Erişim: https://arxiv.org/abs/2005.11401Link
- 6.J. Johnson, M. Douze, and H. Jégou, "Billion-Scale Similarity Search with GPUs," IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2021. [Online]. Available: https://ieeexplore.ieee.org/document/8733051.Link
- 7.N. Reimers and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks," in Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019, pp. 3982–3992. [Online]. Available: https://aclanthology.org/D19-1410.Link
- 8.S. E. Robertson and H. Zaragoza, "The Probabilistic Relevance Framework: BM25 and Beyond," Foundations and Trends in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009.
- 9.C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge University Press, 2008.
- 10.A. Conneau et al., "Unsupervised Cross-lingual Representation Learning at Scale," in Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 8440–8451
Hakdağlı, Ö. (2024). Hybrid Question-Answering System: A FAISS and BM25 Approach for Extracting Information from Technical Document. *Orclever Proceedings of Research and Development*, 5(1), 226-237. https://doi.org/10.56038/oprd.v5i1.535
Bibliographic Info
More from Orclever Proceedings of Research and Development
Single-Bath Dyeing of Blends of Cotton Fibers with New Generation Polyacrylonitrile Fibers with Reactive Dye in Line with the Target of Sustainable Production
Yıldıray Fatih Dilsiz, Seda Keskin, Rıza Atav
2025 · Vol 7 · Issue 1
The Green Step Upper: A Novel Sustainable Bonding Method Replacing Solvent-Based Adhesives in Footwear Upper Assembly
Baris Bekiroglu, Mustafa Yener
2025 · Vol 7 · Issue 1
Innovative Technological Strategies to Enhance Bioavailability in Germinated Grains
Ebru Bozkurt Abdik
2025 · Vol 7 · Issue 1
Graph-Based Customer Segmentation with GraphSAGE on a Customer–Vehicle Bipartite Network
Abdullah Sezdi, Metin Bilgin
2025 · Vol 7 · Issue 1
Natural Language Processing-Based Layered Reconciliation System for Financial Transaction Analysis
Dilara Hazırlar, Özlem Avcı, Mesut Tekir
2025 · Vol 7 · Issue 1
An Integrated Deep Learning Framework for Automated Quality Control and Process Optimization in Slasher Indigo Dyeing
Mohammad Muttaqi, Gizem Daskaya, Kerem Cakir
2025 · Vol 7 · Issue 1