Vol 8, No 5 (2017) > Electrical, Electronics and Computer Engineering >

A Dependency Annotation Scheme to Extract Syntactic Features in Indonesian Sentences

Budi Irmawati, Hiroyuki Shindo, Yuji Matsumoto

 

Abstract: In languages with fixed word orders, syntactic information is useful when solving natural language processing (NLP) problems. In languages like Indonesian, however, which has a relatively free word order, the usefulness of syntactic information has yet to be determined. In this study, a dependency annotation scheme for extracting syntactic features from a sentence is proposed. This annotation scheme adapts the Stanford typed dependency (SD) annotation scheme to cope with such phenomena in the Indonesian language as ellipses, clitics, and non-verb clauses. Later, this adapted annotation scheme is extended in response to the inability to avoid certain ambiguities in assigning heads and relations. The accuracy of these two annotation schemes are then compared, and the usefulness of the extended annotation scheme is assessed using the syntactic features extracted from dependency-annotated sentences in a preposition error correction task. The experimental results indicate that the extended annotation scheme improved the accuracy of a dependency parser, and the error correction task demonstrates that training data using syntactic features obtain better correction than training data that do not use such features, thus lending a positive answer to the research question.
Keywords: Dependency annotation; Dependency relation; Error correction; Indonesian language; Syntactic information

Full PDF Download

References


Buchholz, S., Marsi, E., 2006. CoNLL-X Shared Task on Multilingual Dependency Parsing. In: Proceedings of the 10th Conference on Computational Natural Language Learning. Association for Computational Linguistics, Stroudsburg, USA, pp. 149–164

Cahill, A., Madnani, N., Tetreault, J., Napolitano, D., 2013. Robust Systems for Preposition Error Correction using Wikipedia Revisions. In: Proceedings of the Conference of the North American Conference on Chinese Linguistics: Human Language Technology. Association for Computational Linguistics, Atlanta, Georgia, pp. 507–517

Dahlmeier, D., Ng, H.T., 2011. Grammatical Error Correction with Alternating Structure Optimization. In: Proceedings of the 49thAnnual Meeting of the Association for Computational Linguistics: Human Language Technology -Volume 1. Association for Computational Linguistics, Stroudsburg, USA, pp. 915–923

de Marneffe, M., Manning, C.D., 2008. Stanford Typed Dependencies Representation. In: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation. Association for Computational Linguistics, Stroudsburg, USA, pp. 1–8

Green, N., Larasati, S.D., Zǎbokrtský, Z., 2012. Indonesian Dependency Treebank: Annotation and Parsing. In: Proceedings of the 26th Pacific Asia Conference on Language Information and Computation. Faculty of Computer Science, Universitas Indonesia, Bali, Indonesia, pp. 137–145

Han, N., Tetreault, J., Lee, S., Ha, J., 2010. Using an Error-annotated Learner Corpus to Develop an ASL/AFLError Correction System. In: Proceedings of the 7th International Conference on Linguistic Resources Evaluation. European Language Resources Association, Valletta, Malta, pp. 763–770

Irmawati, B., Komachi, M., Matsumoto, Y., 2016a. Towards Construction of an Error-corrected Corpus of Indonesian Second Language Learners. In: Francisco Alonso Almeida, Ivalla Ortega Barrera, Elena Quintana Toledo, Margarita Sánchez Cuervo (Eds), Input a Word, Analyse the World: Selected Approaches to Corpus Linguistics. Cambridge Scholars Publishing, Newcastle upon Tyne, United Kingdom, pp. 425–443

Irmawati, Budi, Shindo, Hiroyuki, Matsumoto, Yuji, 2016b. Exploiting Syntactic Similarities for Preposition Error Correction on Indonesian. In: Proceedings of the 5th Workshop on Spoken Language Technologies for Under Resource Languages. International Research Institute Multimedia, Information, Communication & Applications, Jogjakarta, Indonesia, pp. 214–220

Irmawati, Budi, Shindo, Hiroyuki, Matsumoto, Yuji, 2017. Generating Artificial Error Data for Indonesian Preposition Error Correction. International Journal of Technology, Volume 8(3), pp. 549–558

Kamayani, M., Purwarianti, A., 2011. Dependency Parsing for Indonesian. In: International Conference on Electrical Engineering and Informatics, pp. 1–5

Kübler, S., McDonald, R., Nivre, J., 2009. Dependency Parsing. Morgan & Claypool Publishers

Larasati, S.D., Kuboň, V., Zeman, D., 2011. Indonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus. In: Proceedings of the 2nd International Workshop Systems and Frameworks for Computational Morphology. Zurich, Switzerland, pp. 119–129

McDonald, R., Lerman, K., Pereira, F., 2006. Multilingual Dependency Analysis with a Two-stage Discriminative Parser. In: Proceedings of the 10th Conference on Computational Natural Language Learning. Association for Computational Linguistics, Stroudsburg, USA, pp. 216–220

Mizumoto, T., Komachi, M., Nagata, M., Matsumoto, Y., 2011. Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners. In: Proceedings of the 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp. 147–155

Nivre, J., de Marnefe, M., Ginter, F., Goldberg, Y., Hajič, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., Zeman, D., 2016. Universal Dependency Volume 1: A Multilingual Treebank Collection. In: Proceedings of the 10th International Conference on Linguistic Resources Evaluation. European Linguistics Resources Association, Portorož, Slovenia, pp. 1659–1666

Nivre, J., Hall, J., Kübler, S., McDonald, R.T., Nilsson, J., Riedel, S., Yuret, D., 2007. The Conference on Computational Natural Language Learning 2007 Shared Task on Dependency Parsing. In: Proceedings of the Joint Conference on Empirical Method on Natural Language Processing and the Conference on Computational Natural Language Learning. Association for Computational Linguistics, Prague, Czech Republic, pp. 915–932

Quasthoff, U., Richter, M., Biemann, C., 2006. Corpus Portal for Search in Monolingual Corpora. In: Proceedings of the 5th International Conference on Linguistic Resources Evaluation. Genoa, pp. 1799–1802

Rozovskaya, A., Roth, D., 2010. Generating Confusion Sets for Context-sensitive Error Correction. In: Proceedings of the 2010 Conference on Empirical Method on Natural Language Processing. Association for Computational Linguistics, Stroudsburg, USA, pp. 961–970

Sneddon, J.N., Adelaar, A., Djenar, D.N., Ewing, M.C., 2010. Indonesian: A Comprehensive Grammar. Routledge, London, United Kingdom

Stack, M., 2005. Word Order and Intonation in Indonesian. In: Lexical Semantic Ontology Working Papers in Linguistics 5: Proceedings of Workshop in General Linguistics. Milan, Italy, pp. 168–182