Analysis Name Entity Disambiguation Using Mining Evidence Method

Adelya Astari, Moch. Arif Bijaksana, Arie Ardiyanti Suryani

Abstract


Hadith is the second guideline and source of Islamic teachings after the Qur'an. One of the most Saheeh hadith is the book of Saheeh al-Bukhaari. Hadith Sahih Bukhari has a chain of narrators, hadith numbers, and contents of different contents. This tradition also has science that discusses the history of the narrators of the hadith called the Science of Rijalul Hadith. In the Sahih Bukhari hadith there are the names of the narrators of the hadith who have the same name, causing obligation between names. That makes it difficult for many ordinary people to understand these ambiguous names because it is not yet known whether the two names are the same person or not. So, it raises the problem of a name ambiguation for ordinary people who cannot distinguish whether the name of the narrator is the same person or not. To solve these problems, a solution is built, namely the disambiguation of names to eliminate the ambiguity of the name by checking the name, hadith number, narrators chain, content topics, circles, countries, and companions of the Prophet that are seen from the 3 last names before the Prophet based on the chain of narrators. Also, the solution is assisted by using a method Mining Evidence with several other approaches, i.e. Association label documents, word association labels, context similarity, cosine similarity, and word2vec to obtain all similarity values between name entities. After the similarity values are obtained, the data are grouped using the Clustering algorithm. This system is expected to be able to produce a good system performance with a confusion matrix based on value precision, recall, and accuracy.

Keywords


Disambiguation, Entity Name, Mining Evidence, Sahih Bukhari, Similarity

Full Text:

PDF

References


Bunescu, R., & Pas, M. (n.d.). Using Encyclopedic Knowledge for Named Entity Disambiguation.

Chairulloh, M. R., Bijaksana, M. A., & Wahyudi, B. A. (n.d.). Analisis Name Matching untuk Nama Arab Menggunakan Metode N-gram dan Jaccard Similarity Pendahuluan Studi Terkait Hadis Pedoman Transliterasi Aksara Arab ke Latin. 1–7.

Cucerzan, S. (2007). Large-scale named entity disambiguation based on Wikipedia data. EMNLP-CoNLL 2007 - Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June, 708–716.

Dia, L., Maka, M., & Tidak, S. (n.d.). No Title.

Farnham, J. E., & Rowland, R. E. (1968). The retention of 133Ba in beagles. ANL-7615. ANL [Reports]. U. S. Atomic Energy Commission, 32–38.

Ginting, M. F., Bijaksana, M. A., Wahyudi, B. A., & Telkom, U. (n.d.). Analisis Pencocokan Nama Arab Dengan Terjemahan Nama Indonesia Menggunakan Metode Jaro Winkler.

Guntara, F. F. (2019). Pembangunan Daftar Kata Terkait pada Kosa Kata Al-Qur ’ an Berdasarkan Kesamaan Distribusiaonal Proposal Tugas Akhir Program Studi Sarjana Informatika Fakultas Informatika Universitas Telkom Bandung.

Gupitasari, L. (2019). Pembangunan Synonym Set untuk Tesaurus Al-Quran dengan Pendekatan Kamus Monolingual dan WordNet Proposal Tugas Akhir Program Studi Sarjana Informatika Fakultas Informatika Universitas Telkom Bandung.

Hoffart, J., Yosef, M. A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., & Weikum, G. (2011). Robust disambiguation of named entities in text. EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 782–792.

Nguyen, H. T., & Cao, T. H. (2008). Named entity disambiguation on an ontology enriched by Wikipedia. RIVF 2008 - 2008 IEEE International Conference on Research, Innovation, and Vision for the Future in Computing and Communication Technologies, 00(c), 247–254. https://doi.org/10.1109/RIVF.2008.4586363




DOI: https://doi.org/10.31294/p.v22i2.8196

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

ISSN2579-3500

Dipublikasikan oleh LPPM Universitas Bina Sarana Informatika

Jl. Kramat Raya No.98, Kwitang, Kec. Senen, Kota Jakarta Pusat, DKI Jakarta 10450
Telepon: 021-21231170, ext. 704 / 705
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License