Clause Alignment for Bilingual Hong Kong Legal Texts with Available Lexical Resources
Abstract: This paper reports on our recent work in clause alignment for English-Chinese legal texts using available bilingual lexical resources, for the purpose of acquiring examples at various linguistic levels for EBMT. It formulates similarity measures
Clause Alignment for Bilingual Hong Kong Legal Texts with
Available Lexical Resources*
Chunyu Kit Jonathan J. Webster King Kui Sin Haihua Pan Heng Li
Department of Chinese, Translation & Linguistics, City University of HK, Tat Chee Ave., Kowloon, Hong Kong
Abstract: This paper reports on our recent work in clause alignment for English-Chinese legal texts using available bilingual lexical resources, for the purpose of acquiring examples at various linguistic levels for EBMT. It formulates similarity measures for candidate clause pairs in terms of matched lexical items and presents the implementation of a clause alignment algorithm using these measures. Experimental results show that this approach achieves a perform-ance of 94.6% alignment accuracy. It confirms that lexical information gives a reliable indication of correct align-ment. The significance of this lexical-based approach lies in both its simplicity and effectiveness.
Keywords: Clause alignment, lexical-based text alignment, example-based machine translation, similarity measure
Text alignment is a critical task in current MT technology. It serves various purposes, e.g., con-structing statistical translation models (Brown et al. 1990, Brown et al. 1993) and acquiring exam-ples for EBMT (Kay 1980, Nagao 1984). Basically, text alignment approaches can be classified into two categories. The statistical-based approaches rely on non-lexical information, such as sentence length, sentence position, co-occurrence frequency, sentence length ratio in two languages, etc., as illustrated in previous research (Gale and Church 1991, Church 1993, Dagan et al. 1993, Kay and Roscheisen 1993). The attraction of these resource-poor approaches arises from the sharp contrast between their poor resources and their rich outcomes. On the other hand, resource-rich approaches, such as lexical-based text alignment, rely on existing lexical resources such as large-scale bilingual dictionaries and glossaries. As more and more bilingual lexical resources become available, it is worth investing more research effort to explore the effectiveness of lexical-based approaches.
In this paper we report on our recent work in clause alignment using available lexical resources. It is part of the example acquisition phase of an ongoing EBMT project, aimed at acquiring exam-ples at various linguistic levels, including clause, phrase, and word. By the term “example” we mean The work presented here is carried out as part of the CERG project “EBMT for HK Legal Texts” funded by HK UGC under the grant #9040482, with Jonathan J. Webster as the principal investigator and Chunyu Kit, Caesar S. Lun, Haihua Pan, King Kuai Sin and Vincent Wong as co-investigators. Yan Wu worked for the project as research associate. The authors wish to thank all team members who have contributed to the research that enables this paper. *