In "Ark Review," we ask various people to look at Ichitaro Ark from different viewpoints. This time, Yoshi Mikami, who works for IBM Japan, Ltd., in Software Marketing Division, will give us his personal view of Ichitaro Ark's multilingual processing capabilities.




In Japanese


Yoshi MIKAMI, Software Marketing Div., IBM Japan, Ltd.

It has been already ten years since I became one of the co-authors and co-editors of the book, "The PC University of Foreign Studies" in Japanese (Tokyo: Gijutsu Hyoron Sha, 1991), which is now a classic of the multilingual processing field. I have been since weekly asked by people inside and outside of IBM such requests as: "My customer decided to move their manufacturing facility to a foreign country, and asked me how to process the language of that country. Help!" I became more familiar with multilingual processing, as I tried to answer them with "The Languages of the World" home page myself, publishing with the co-authors "The Multilingual Web Guide (O'Reilly Japan, 1997), or in the "Multilingual Computing Users Group" (MULCO) mailing list with my friends.
On the left: in front of the remains of a Hindu temple at Prambanan, near Yogyakarta, Indonesia


Introduction

In December, 1999, Ichitaro Ark, a word processing program using 100% Pure Java, became available from Justsystem Corporation, a major software company in Japan, at the list price of 8,900 yen plus tax. This is my experience report of using this program in the Japanese installation mode under Windows Japanese version, as translated from the Japanese original. Ark can be installed in the English mode on Windows English version, but that is outside of Justsystem's official technical support.

One of Ichitaro Ark's major features, as advertised by Justsystem, is its multilingual capability, allowing mixture of multiple languages in one document. How well does this work? I was interested in the answer to this question from the beginning of the Ark project, since the word processing functions that Ark had planned to offer seemed very similar to the HTML functions which I use in creating my own home page, using an editor. However, I did not participate in Ark's preview program last summer, because I felt that it exceeded the level of my technical capability on Java.

I usually carry with me a notebook PC (IBM ThinkPad 570 with 300MHz processor and 96MB memory), running Windows 98 Japanese version. It includes the Japanese input method, French and Russian keyboard resources, and Simplified and Traditional Chinese input methods. As I push Alt-Shift keys, I can switch to these six languages, including English. Here, I would like to share with you my experience of using Ichitaro Ark for processing these multiple languages.

I understand that the word "Ark" comes from Noah's Ark which saved the human beings from the disaster of the flood, as written in Genesis of the Bible. I therefore attach a sample Ichitaro Ark file that contains the translations into these multiple languages from the very beginning of Genesis (see the figure on the right, below). You can confirm these foreign language characters in the Bibles that you find in the foreign bookstores in the major cities of Japan. I myself found these characters at the well-known bookstore near the south exit of JR Shinjuku Station. See also the Internationalization of Ichitaro Ark in the fourth installment of the "From the Ark Developers" series, written in Japanese.

If you have installed Ichitaro Ark and started it, let's first write Japanese and English texts. Select Format and the Property of Document on the menu bar, key in "The Bible in Various Languages" as the title of the document. This title will later appear at the top of the browser window, because the conversion later of the Ark document to HTML will create <TITLE>The Bible in Various Languages</TITLE> in the <HEAD> area. In the same Property of Document panel, select Default language, put a check mark on the background color, and choose Yellow. Yellow seems to give a good background to the default characters in black.


French Language Processing

In my college days, I took French Conversation up to the advanced level from Prof. Yoshio Fukui, who later became well known for his NHK Radio lessons. I had heard that the French people, who knew English, would not speak it. During my stays in U.S. for graduate school and IBM work, I visited Paris and spoke my naïve French to the hotel receptionist, and was responded to by his fluent British English. I had since done my world business completely in English, and it is only recently that I started to re-learn French.

French uses the same 26-letter alphabet as English. Its difference from English is in using the accent marks, such as the acute accent ('), grave accent (`), circumflex accent (^), and the tréma (¨), a diaeresis mark which looks like the umlaut in German, all on the vowels, and the tilde on C/c. In France, these accent marks are normally omitted on the capital letters, while in Quebec, Canada, they must absolutely be put on the capital letters, as I understand.

In Windows 95/98, click on Control Panel, Keyboard, Language, French (Canada), Left Alt-Shift (or Ctrl-Shift) and OK, to get the Canadian French Keyboard resource from Windows CD-ROM. The reason why French (France) is not selected is because the keys on the Canadian French keyboard are laid out in the QWERTY order as in the Japanese JIS/U.S. ASCII keyboards, while those on the French French keyboard are arranged in the unique AZERTY order, with A and Z keys exchanged, M key appearing immediately right of L key, etc., which is quite inconvenient for a normal Japanese user.

Starting Ichitaro Ark, get it ready for French text input, by clicking in the menu bar on Tools, Language & Font and Settings, select Add, French (Canada) and OK, and then choose Arial in the Default font field. This Arial font provides a good balance to the other fonts that are used in Japanese and other languages, in my judgement. Then, push Alt-Shift keys till you get "Fr" on the submerging task bar at the bottom of the Windows desktop. Click on the text area and select Document Format, Font & Language, Settings, French (Canada) and Default (Arial), and now start typing French characters. The E/e characters with acute accents (É/é) are found on the third key to the right of M key, and other accented characters can be inputted by first depressing the accent marks found on the keys to the right of L and P keys and then pushing the vowel or C/c key. Ichitaro Ark's word-wrap function seems to work OK, moving the whole word when the end of line is reached.

I want to talk about one detailed point, regarding the traditional French punctuation method, which says that the comma and period are placed immediately after the word preceding them, and that the colon (deux-points), semicolon (point-virgule), question mark (point d'interrogation) and exclamation mark (point d'exclamation) are placed one space after the word preceding them. Actually, I recently found in the Penguin Classics series by Penguin Books Ltd that the books published in Britain before 1950s abided by this punctuation rule. I, therefore, call this rule "the Traditional Anglo-French Punctuation Method, " versus the rule that we are familiar with "the Modern American Punctuation Method." In the former rule, a word followed by a space and a colon, for example, is word-wrapped together to the next line. This process is done automatically in HTLM by inserting &nbsp; (non-break space). It is a function found normally as an optional feature in a French word processing program, which Ichitaro Ark may want to include in the future.

In my Windows 98, I do not use the German keyboard resource, but know that Y and Z keys are exchanged, that umlaut U key is immediately right of P key, that umlaut O and A keys are found on the right of L key in this order, and that ess-zett key (ß which looks like Greek beta) immediately right of Zero (0) key. Selecting the German (Swiss) keyboard would not help much, as the French (Canada) keyboard did in the French case, because Y and Z keys are still exchanged and no ess-zett key is found there due to the Swiss German rule of using "ss," instead of ess-zett. Our last resort to get ess-zett is to depress Alt-0223 from the numeric keypad (or after keying NmLk key on your notebook PC). As some of us will recall, pushing Alt-nnn keys (nnn in three digits in the ASCII table or any other code table) in the MS DOS/PC DOS age changed to depressing Alt-nnnn keys in four digits in our Windows period.


Russian Language Processing

In our student days, many of us learned Russian because the Soviet Union seemed to be competing with and, sometimes exceeding, the U.S. in the science and technology fields. Some of us even tried to study abroad in the Soviet Union, if not at University of Moscow, but at Patrice Lumumba Peoples' Friendship University. Russia is anyway our big, influential neighbor, and I believe that having a basic understanding of the Russian language is very important.

The Russian alphabet, starting with ah, beh, veh, gheh and deh, is included in our JIS double byte character set (DBCS), along with the Greek alphabet, starting with alpha, beta, gamma, delta and epsilon (a, b, g, d, e in the double width). You can get these full-size characters under the Japanese input method by keying in "ro-shi-a" (Russia) and pushing the Conversion key several times. (Key in "gi-ri-sha" (Greece) to get the Greek alphabet.) The characters used in the Russian, Bulgarian and other Slavonic languages are called "Cyrillic," in commemoration of bothers St. Cyril and St. Methodius of Thessaloniki, Greece, of the East Roman Empire at that time, who in the 9th century translated the Bible into a Slavonic language for the first tie and started Christian missionary work.

In order to get the Russian fonts from the Windows 95/98 CD-ROM, click on Control Panel, Add/Remove Programs, Multilanguage Support, Cyrillic Languages Support and OK. And then in order to get the Russian keyboard resource, select Control Panel, Keyboard, Russia and OK. In the QWERTY key positions of the JIS keyboard are found the Russian characters: yi kratkoye (short yi, that is, "i"), tseh ("ts"), uh ("u"), kah ("k"), yeh ("e") and en ("n"), in that order.

Before being able to process Russian using Ichitaro Ark, two things must be done first: (1) Select on the menu bar Display, Panel Display Settings and Text Display, put a check mark on Text Anti-alias, and select OK; and (2) Select on the menu bar Tools, Language & Font Settings, Russian and OK, and choose Arial for Default Font. Now, you can start typing Russian characters, after depressing Alt-Shift keys until "Ru" is shown on the submerging task bar at the bottom of the Windows desktop, clicking on the text area, and selecting Text Format, Fonts & Languages, Settings, Russian and Default font (Arial).

If you are planning to convert the Ichitaro Ark document to an HTML page later, selecting Default and then converting to HTML will show the Russian characters properly by Internet Explorer, but improperly by Netscape Navigator in the full-size JIS characters, as I found out. And, selecting Arial for Serif will show proper characters by the latter browser. The Russian "yo" characters (two dots over E/e) may be difficult to get because they are assigned to the Half-Size/Full-Size character key, left of One (1) key, on the JIS keyboard. In such a case, use Alt-0168 (capital letter) and Alt-0184 (small letter), using the method that we learned for typing German ess-zett.


Chinese Language Processing

I learned Chinese when I was stationed ten years ago in Taipei, working as software technology adviser to IBM's joint venture with III (Institute for Information Industry), a government agency. At that time, I used taxi a lot. The Japanese would normally write Chinese characters on paper to communicate with the taxi driver there, but I quickly learned how to say "You-tsuan" (turn right), "Tsuo-tsuan" (turn left), "Yi-chi-tsuo" (go up straight), "Kuo-ma-lu, ting cha!" (cross the road and stop the car!), etc. Taxi Chinese, thus, is the basis of my Chinese. After this experience in Taiwan, I took advanced Chinese at the Department of Literature, Shanghai University, for four weeks during the sabbatical that was allowed for IBM's long-term senior employees.

China and Taiwan are a multi-race, multilingual society. The Chinese majority speak different dialects, but the written Chinese language is one based on the Beijing dialect. This written language, however, comes in two flavors: Simplified Chinese used in mainland China and Traditional Chinese used in Taiwan and Hongkong. The most common encodings used are GB-2312 (GB being short for Guojia Biaozhun, i.e., National Standard) for Simplified Chinese and Big5 (five major applications having been planned by Taipei Computer Association at one time) for Traditional Chinese.

For your Windows-based computer, you can download the Simplified and Traditional Chinese input methods, as Global IME 5.01 for Simplified Chinese and the Language Pack, and Global IME 5.01 for Traditional Chinese and the Language Pack, respectively, from Microsoft's Japanese page:

http://www.microsoft.com/windows/ie_intl/ja/ime.htm
or English page:
http://www.microsoft.com/windows/ie/features/ime.asp
The file names are scmono.exe (4.1MB) and tcmondo.exe (3.8MB), respectively. (Likewise, Global IME 5.01 for Korean can be downloaded and installed, but Global IME 5.01 for Japanese should never be installed on your computer with Windows 95/98 Japanese version because a disaster awaits you...)

Global IMEs can be used to input Chinese characters on Microsoft Input Method Manager-compliant application software, such as Outlook 98, Outlook Express 4.0, Word 2000, etc. The Simplified Chinese characters can be created by first depressing Alt-Shift keys to display the floating task bar, showing Zhong (middle) on the left, Pin (common) on the right, and the moon and period marks in the middle; and then using the romanized Chinese characters. This method is called the Pin-yin input method, and this romanized Chinese characters are the ones that the beginning Chinese students in Japan and elsewhere are very much familiar with.

The Traditional Chinese characters can be processed in one of two methods: Zhu-yin and Cang-jie. Depressing Alt-Shift keys further will show a floating task bar with Ban (half) or Zhuan (whole) on the left and the Bo phonetic character or "A" on the right, which is for the phonetic input method called "Zhu-yin" or "Bo-Po-Mo-Fo" from the first four letters of the phonetic alphabet. The "Bo-Po-Mo-Fo" phonetic alphabet is taught from the kindergarten and elementary school in Taiwan. On the keyboard, as you can get the snapshot of the keyboard layout by right-clicking on the floating task bar, the characters B, P, M and F, are assigned on 1, Q, A and Z keys, respectively, from the upper left corner to down. These phonetic characters look to a typical Japanese hard to remember, but they are used when you go to Taiwan to learn Chinese, and I myself learned them while my stay in Taiwan.

The other Traditional Chinese input method is for professionals, and can be activated by depressing Alt-Shift keys still further, until you get a floating task bar with Ban (half) or Zhuan (whole) on the left and Cang (warehouse) or "A." On the keyboard, as you can right-click on the floating task bar and get the snapshot of the keyboard layout, the 26 Chinese characters such as the sun, moon, fire, water, tree, gold, earth, bamboo, twenty, etc., are assigned to the keytops. Canjie (or Ts'ang-chay, in the Wade-Giles romanization mainly used in Taiwan) is the name of the hero in the Chinese legend, who is said to have invented Chinese characters during the reign of the third emperor, Emperor Yellow, by observing the birds' footprints.

You can now start Ichitaro Ark, select Chinese/China and Chinese/Taiwan, and specify MS Hei or MS Song font, for preparation in Chinese input. Both fonts do not show up beautifully unfortunately, the former better than the latter, in my judgement. You cannot input Chinese characters directly in Ichitaro Ark, but can input indirectly in Outlook Express, for example, by selecting New Mail, choosing Format, Encoding and Unicode (UTF-8) and keying in Simplified Chinese or Traditional Chinese in the e-mail text area, after depressing Alt-Shift keys to get the Chinese IME. These Chinese characters in Outlook Express can then be cut & pasted to Ichitaro Ark's multilingual document. This cut and paste process sometimes does not work properly, especially on some Simplified Chinese characters, in which case Outlook Express can be changed to GB-2312 encoding, and the Outlook Express message can be saved to a text file, that can now be imported to Ichitaro Ark.

As you have seen, Ichitaro Ark does not allow direct Chinese character input from Global IME. In Japan, there are several software packages, such as Kodensha's Chinese Writer that allow direct input to a Japanese application program, which also work fine for Ichitaro Ark, as reported by others. In these packages, the Pin-yin input method for Traditional Chinese (such an input method already available in Windows 98 Traditional Chinese version) and other functions useful for Japanese users are included. In addition, by the time you see this review, Windows 2000 will have been shipped, which should provide better multilingual capabilities.

Ichitaro Ark does not handle the right-to-left written languages such as Arabic and Hebrew, as mentioned in Justsystem's announcement. Asian languages, such as the Indian languages and Thai, cannot be handled, either. There are reports that Vietnamese can be processed using VietKey 4.08 that is linked from Vietnam Net's download page (the document being defined as an English document), so more adventurous users may want to try it and other font/input method packages. Plug-ins, such as the ones that allow display of Chinese and Korean menus are being prepared. So, let's exchange our experiences in Justsystem's Ark Q&A forum or in the MULCO mailing list that was mentioned earlier.


Word Processing Functions, HTML Conversion and Printer Support

Ichitaro Ark includes only the essential word processing functions. This is great, and please do not make a word processing program more complex. You can insert images, and add multiple languages in the ordered or unordered list, and tables. You can convert multilingual Ichitaro Ark documents to HTML format by selecting on the menu bar File, Save with New Name, HTML 4.0 File and Unicode (utf-8) Character Set. A good affinity with HTML is one of Ichitaro Ark's strong points, although you might think it should come naturally because the Ark documents are based on Extensible HTML (XHTML). There are many people, I am sure, who are dissatisfied with the HTML functions of the other word processing programs on the market.

For printing documents in Ichitaro Ark, medium-priced printers and up are recommended, as I found out. Some low priced printers try to print outline fonts by the printer fonts built into them, so as to print faster, which sometimes causes trouble in printing Russian fonts, in my case. The latest printer drivers, such as from the manufacturers' home pages, rather than the drivers that came with the printers, are recommended to be used with Ichitaro Ark.


Conclusion

My impression of using Ichitaro Ark for a short period of time is that a fine job has been done, even in consideration of the status of Java2 and XML. As JDK 1.2 or 1.3 becomes available for all operating systems, being able to use the same Java application software on all operating systems is such a convenience. Java technology may become Noah's Ark to save the human beings from the modern disaster of porting application programs to different operating environments.

From the multilingual viewpoint, I regret that we have made multilingual processing such a complexity. Recently, as Mac OS 9 got shipped in September, 1999, as including all the Language Kits that had been separately sold, and Windows 2000 in February, 2000, more users are starting to use multilingual functions. It is only a beginning that Unicode is used in JDK, so we must deepen our knowledge of multilingual capabilities by using them.

In Japan, the large cities used to be more multilingual and multi-cultural, because American and European business people lived there. At present, we are in Japan at an interesting time when the local cities have become multilingual and multi-cultural, because those who work in the plants and factories there are from Asia and South America, and the restaurants and foodstores there are full of multi-cultural items. In this environment, Ichitaro Ark provides for us an excellent educational tool to learn multilingual processing on computers.





Yoshihiko Mikami Feb. 24, 2000


(C)2000 Justsystem Corporation