International Standard ‘Language Resources Management-Word segmentation of written texts' and Korean

 2019.12.3.

The task for adopting "Language Resources Management-Word segmentation of written texts", one of the important issues in language information processing as an international standard was proposed as an important item by Subcommittee SC 4, Technical Committee ISO/TC 37, ISO (the International Organization for Standardization) in August, 2007.

The purpose of "Language Resources Management-Word segmentation of written texts" which was set by Technical Committee ISO/TC 37, ISO (the International Organization for Standardization) is to make sure that Word Segmentation in different languages is carried out in a unified and standard way for language information processing.

The main task of Subcommittee SC 4, Technical Committee ISO/TC 37 is to adopt International Standards for Word Segmentation in Korean, Chinese and Japanese.

Kim Il Sung university took part in adopting International Standards such as ISO 24614-1:2010 "Language Resources Management-Word segmentation of written texts" (Part 1: Basic concepts and general principles) and ISO 24614-2:2011 "Language Resources Management-Word segmentation of written texts" (Part 2: Word segmentation in Chinese, Japanese and Korean) for six times from August, 2007 to 2010, thus making an important contribution to applying these International Standards on the world scale since 2011.

Cover of Standards
Fig 1. Cover of Standards

They managed to adopt International Standards in accordance with features of Korean, an agglutinative language that are different with inflectional language and isolating language, so that it can be an example standard which contains some features of different languages in the world and have a standard for keeping not only unification and compatibility in word segmentation but also rapidity and accuracy in language information processing on the world scale.

In the future, this standard will be used for keeping consistency in Word segmentation of written texts in several applied branches such as natural language processing, information retrieval, search tools, question-answer, machine translation, speech composition, document correction, speech recognition, character recognition, e-library, semantic net, electronic business, electronic learning and so on in Chinese, Japanese, Korean, Thai, Vietnamese, Mongolian and Tibetan.

In particular, Korean is widely introduced to the world that it is one of the best languages with plenty of vocabularies and accurate grammar and firmly regarded as one of typical languages in the world through adopting this standard.

Pae Kwang Hui
Fig 2. Dr., Assoc. Prof. Pae Kwang Hui

This standard was adopted by 4 famous professors in Asia including Pae Kwang Hui (Doctor, Associate Professor), instructor in Korean linguistics department, Faculty of Korean Language and Literature, Kim Il Sung university, Sun Maosong (Professor, Doctor), dean of faculty, Faculty of Computer, Qinghua University and so on.