Zoom Search Engine - International language supportZoom has improved support for international language with the following features:
Unicode supportThe indexer is now fully Unicode enabled and will support indexing and string matching on any Unicode text. However, Windows 98 and earlier versions have partial Unicode support, so you may come across issues if you attempt to index with Unicode on these platforms. Note: The JavaScript version may require IE 5.5 or above for some languages. Tips for some common languages
European languages (French, German, Danish, Swedish, etc.)European websites commonly use the standard English character encoding, CP1252 (also known as "windows-1252"). A few web sites will also use "utf-8" which almost the same as "windows-1252". To make Zoom work with "windows-1252", ensure the following:
Russian (Cyrillic)Russian websites commonly use one of three encodings: KOI8, CP1251 (aka "windows-1251"), and Unicode UTF-8. If your website uses KOI-8 or windows-1251, ensure that:
If your website uses UTF-8, ensure that:
Asian languages (Japanese, Chinese, etc.)Since many east-asian languages do not have any word delimiting character (a character that determines the end and start of a word, such as a "space" in most latin-based languages), Zoom can only provide limited support for these languages. JapaneseIf your website is encoded in UTF-8, Zoom will successfully index your site, and will be capable of performing searches. However, search performance and accuracy is limited, as Zoom will only split words by:
This means that an entire sentence may be indexed as a "word". However, if you enable "Substring match for all searches" on the "Languages" tab of the Configuration window, then searches which appear within a sentence will match correctly. Zoom does not currently support indexing Shift-JIS pages. You will have to convert your website to UTF-8 if you wish to use it with Zoom. ChineseZoom now supports indexing Chinese pages in either Big5, GB2312 or UTF-8 encoding. However, search performance and accuracy is limited, as Zoom will only be able to split "words" based on formatting. If you are indexing a Chinese website, you should enable the "Substring match for all searches" option on the "Languages" tab of the Configuration window. This will then allow for matching of words which have not been split correctly. You should also enable the "Support single-case languages" option, also from the "Languages" tab of the Configuration window since Chinese does not have upper or lower case differences. Notes for using GB2312 with the Javascript option: Due to issues with browser support for using GB2312 in Javascript (JS), it may be necessary to create a custom search form to encode the search query so that it can be correctly decoded in JS. If you are using GB2312 and the JS search option, you may need to use the following search form HTML to submit your query correctly: <form method="get" action="search.html" onsubmit="window.location='search.html?zoom_query='+escape(this.zoom_query.value); return false;">
CroatianCroatian websites can often use a number of different encodings, including windows-1250 and UTF-8. Zoom will successfully index and search Croatian sites, however there are a few known issues with some Croatian diacritic characters. First of all, if you are using an encoding/charset besides UTF-8 (such as windows-1250) AND you are using the Javascript search option, then some searches may fail if it contains certain diacritic characters (eg. HTML entities such as "š" and "č"). Note that this is not an issue if you are using UTF-8 encoding on your website (and Zoom is configured accordingly). Note also that none of this applies if you are using one of the other search platforms available, namely PHP, ASP or CGI. Another known issue is that the "Jump to highlighting" feature may also fail to work for words containing the aforementioned diacritic characters. ArabicArabic websites can often use a number of different encodings, including windows-1255, windows-1256 and UTF-8. Zoom will successfully index and search Croatian sites, however there are a few known issues with some Croatian diacritic characters. First, there is an option to "Strip Arabic diacritic marks from works" which will help when searching in form documents with arabic diacritic characters that are not often entered by most users. You can find this option on the "Languages" tab of the Configuration window and we recommend enabling this for Arabic searches. Another known issue is that the "Jump to highlighting" feature may also fail to work for words containing the aforementioned diacritic characters. Setting the locale on your webserverWhether or not you need to change the locale setting depends on your web server environment settings, so you should ask your web host for more information on using foreign languages on their installed platform if you are uncertain. For PHP usersThe following is an example to set the russian locale on a Windows-based PHP server: if (setlocale(LC_ALL, "rus")
== false) // for russian
To modify the "search.php" file permanently, click on "Templates"->"Modify search script source code" in the Zoom Indexer window. Alternatively, save a customized copy of the script somewhere and specify the path to this customized script on the "Advanced" tab of the Configuration window. Locale names can be found here. Windows servers uses different locale names, and their list can be found here. More information on setlocale(...) is available on php.net For ASP/IIS usersOn some IIS servers, the locale and regional settings of the server may conflict with the execution of the ASP script. This will cause some characters to appear incorrectly on your search page, despite having the correct charset specified on your search template, and the corresponding encoding on the "Languages" tab of the Configuration window. In such cases, you may need to add the following preprocessing directive to the ASP script. Note that this line must be added as the very first line of the "search.asp" file: <%@ CODEPAGE=1252%>
To modify the "search.asp" file permanently, click on "Templates"->"Modify search script source code" in the Zoom Indexer window. Alternatively, save a customized copy of the script somewhere and specify the path to this customized script on the "Advanced" tab of the Configuration window. Note: If you are using your own ASP search page, and you are embedding "search.asp" as described in this FAQ, then you will need to specify the above line on the top of your custom ASP search page and NOT in "search.asp" (since "search.asp" will only be executed AFTER your page, and this line needs to be the first line of ASP executed). Known Issues
|