Penentuan Kekerabatan Hewan Berdasarkan Struktur Protein IGF2 Menggunakan Metode K-Means dan N-Gram

Ruth Ema Febrita, Maghfirotul Amaniyah

Abstract


Dalam ilmu Biologi, terdapat berbagai cara untuk menentukan kedekatan antar dua individu, antara lain dengan mengamati kesamaan morfologi fisik kemudian membuat dendogram dan pembuatan pohon filogeni untuk menelusur kekerabatan berdasarkan sejarah evolusi suatu makhluk hidup. Akan tetapi pendekatan ini sangat sulit untuk dilakukan apabila hewan yang akan ditentukan kekerabatannya tidak berada dalam kondisi yang hidup, sehingga sangat sulit untuk mengamati ciri-ciri fisik yang ada. Penelitian ini bertujuan untuk memberikan pendekatan yang berbeda dalam menentukan kekerabatan hewan dengan menggunakan struktur protein IGF2. Kekerabatan dilakukan dengan menggunakan metode clustering K-Means. Untuk memudahkan dalam melakukan pengelompokkan struktur protein yang memiliki panjang sekuens yang beragam, maka teknik n-gram digunakan untuk memecah string menjadi beberapa subsekuens dengan panjang yang sama. Pengelompokkan dengan metode K-Means telah dilakukan dan mendapatkan hasil terbaik pada jumlah cluster sebanyak tujuh cluster, dengan silhouette coeficient rata-rata sebesar 0.331, indeks puritysebesar 0.735, dan precisionsebesar 0.823 yang mengindikasikan proses clustering cukup efektif.


In Biology, there were various ways to determine the closeness between two individuals, such as by observing the similarity of physical morphologies then making a dendogram and also by making a phylogenetic tree to trace the kinship based on the evolutionary history. However, this approach is very difficult to do if the animal whose relatives are to be determined is not in a living condition, so it is very difficult to observe the existing physical characteristics. This study aims to provide a different approach in determining animal kinship using clustering algorithm to cluster the IGF2 protein structures. Kinship is determined using the K-Means clustering method. N-gram technique is used to break the sequence into several subsequences with the same length, because each sequence can have various length. Grouping with the K-Means method had been done and got the best results on the number of clusters as many as seven clusters, with an average silhouette coefficient of 0.331, a purityindex of 0.735, and a precisionof 0.823 which indicates the clustering process is quite effective. 



Keywords


analisis kekerabatan, k-means, n-gram

References


Adhe, D., Rachman, C., Goejantoro, R., & Tisna, D. (2020). Implementation Of Text Mining For Grouping Thesis Documents Using K-Means Clustering. Jurnal EKSPONENSIAL, 11(2), 167–174.

Baral, K., & Rotwein, P. (2019). The insulin-like growth factor 2 gene in mammals: Organizational complexity within a conserved locus. PLoS ONE, 14(6), 1–23. https://doi.org/10.1371/journal.pone.0219155

Bateman, A., Martin, M. J., Orchard, S., Magrane, M., Agivetova, R., Ahmad, S., Alpi, E., Bowler-Barnett, E. H., Britto, R., Bursteinas, B., Bye-A-Jee, H., Coetzee, R., Cukura, A., da Silva, A., Denny, P., Dogan, T., Ebenezer, T. G., Fan, J., Castro, L. G., … Zhang, J. (2021). UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research, 49(D1), D480–D489. https://doi.org/10.1093/nar/gkaa1100

Christian, H., Seno, D., Pratama, B., & Putri, A. G. (2022). Analisis Hubungan Kekerabatan Fenetik Varietas Portulaca oleracea dan Portulaca grandiflora di Desa Grogol Kelurahan Dukuh Kota Salatiga. 11(1), 6–11.

Criado-Mesas, L., Ballester, M., Crespo-Piazuelo, D., Castelló, A., Benítez, R., Fernández, A. I., & Folch, J. M. (2019). Analysis of porcine IGF2 gene expression in adipose tissue and its effect on fatty acid composition. PLOS ONE, 14(8), 1–18. https://doi.org/10.1371/journal.pone.0220708

Dwi, P., Prasetya, A., Ari, I., Zaeni, E., & Nafalski, A. (2019). Journal Classification Using Cosine Similarity Method on Title and Abstract with Frequency-Based Stopword Removal. 3(3). https://doi.org/10.29099/ijair.v3i2.99

Febrita, R. E., & Amaniyah, M. (2021). Seminar Nasional Terapan Riset Inovatif (SENTRINOV) Ke-6. Jurnal Seminar Nasional Terapan Riset Inovatif (SENTRINOVE), 7(1), 260–267.

Febrita, R. E., Mahmudy, W. F., & Wibawa, A. P. (2019). High Dimensional Data Clustering using Self-Organized Map. Knowledge Engineering and Data Science, 2(1), 31. https://doi.org/10.17977/um018v2i12019p31-40

Haviluddin, H., Patandianan, S. J., Putra, G. M., Puspitasari, N., & Pakpahan, H. S. (2021). Implementasi Metode K-Means Untuk Pengelompokkan Rekomendasi Tugas Akhir. Informatika Mulawarman : Jurnal Ilmiah Ilmu Komputer, 16(1), 13. https://doi.org/10.30872/jim.v16i1.5182

Irsyad, H., & Pribadi, M. R. (2020). Implementasi Text Mining Dalam Pengelompokan Data Tweet Pertanian Indonesia Dengan K-Means. KURAWAL Jurnal Teknologi, Informasi Dan Industri, 3(2), 164–172. https://t.co/FXtzMcbdHp

James, B. T., Luczak, B. B., & Girgis, H. Z. (2018). MeShClust: an intelligent tool for clustering DNA sequences. Nucleic Acids Research, 46(14), E83. https://doi.org/10.1093/nar/gky315

Riandini, E., & Astuti, R. R. S. (2020). Hubungan Kekerabatan Fenetik Pisang di Kecamatan Kabawetan , Kabupaten. 3(2), 111–117.

Wei, C., Wu, M., Wang, C., Liu, R., Zhao, H., Yang, L., Liu, J., Wang, Y.,

Zhang, S., Yuan, Z., Liu, Z., Hu, S., Chu, M., Wang, X., & Du, L. (2018). Long Noncoding RNA Lnc-SEMT Modulates IGF2 Expression by Sponging miR-125b to Promote Sheep Muscle Development and Growth. Cellular Physiology and Biochemistry, 49(2), 447–462. https://doi.org/10.1159/000492979

Xiang, G., Ren, J., Tang, H., Fu, R., Yu, D., Wang, J., Li, W., Wang, H., & Wan, H. (2018). Editing porcine IGF2 regulatory element improved meat production in Chinese Bama pigs. Cellular and Molecular Life Sciences CMLS. https://doi.org/10.1007/s00018-018-2917-6

Yudiarta, N. G., Sudarma, M., & Ariastina, W. G. (2018). Penerapan Metode Clustering Text Mining Untuk Pengelompokan Berita Pada Unstructured Textual Data. Majalah Ilmiah Teknologi Elektro, 17(3), 339. https://doi.org/10.24843/mite.2018.v17i03.p06




DOI: https://doi.org/10.31294/inf.v9i2.13808

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Index by:

 
  
Published by Department of Research and Public Service (LPPM) Universitas Bina Sarana Informatika with supported Relawan Jurnal Indonesia

Jl. Kramat Raya No.98, Kwitang, Kec. Senen, Kota Jakarta Pusat, DKI Jakarta 10450
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License