Problems with codepages (character encoding) come 10 a penny – so here is an article to read before you cry for help.
- If you see funny characters like ? / 伀 / ⴀ / 一 instead of the € / é / å / á / ó / ń / í / ł / ź / ć / ą / ę / ś / Д / И / Б / ا / ط / غ that you should - then you have codepage issues.
- If a document system suddenly stops working where is was perfectly fine before – then you may have introduced a new character into a sensitive part of the processing. E.g. An existing trigger label gets a new language and therefore does not match its trigger value. This is also, of course, a codepage issue.
- If you introduce the € Euro currency into your system and it is not coming out on the printer – then you may have codepage issues.
These are likely to happen when:
- Someone adds a new foreign speaking trading partner to your application.
- Someone adds a new language to the current system.
- Someone cuts and pastes text in one language from a web site into a database field in your application.
- Some administrator changes fonts / printers / database settings / enables extended language settings somewhere along the line.
Well – do not worry – there is a simple process to get it working again. But before that we should learn a little something about codepages.
Good things to know are:
- A file / data stream can only have one codepage at any one time.
- Printers and fonts also need to support your codepages.
- StreamServe cannot guess which codepage you are sending into it.
- Your StreamServer server has a default codepage that is used if none other is specified (installed with the operating system).
- Your StreamServe project will use the default codepage of the server (see previous point) if none other is specified (say on the input analyzer or in a filter).
- PDF format is very forgiving and many printing issues are not found when developing with PDF.
- UE.exe is a simple UTF Editor provided by StreamServe with all supported codepages in it - and is available in all versions of StreamServe. (Windows only).
- Lookup tables & SLS files in StreamServe should be created in UTF-8 codepage.
- Finally - There are many codepages out there with different names and standards and levels of quality. Read a bit more about it on the internet if you want to.
So to the process to get it back working:
- Try to obtain the name of the codepage (from an administrator) that your application is producing its datastreams' in.
- Send your output to a file and view it – does it look OK here? If not then you need to go back to your system administrator and reconfigure your application.
- If it does look OK then you should confirm the codepage – take a look at the hex values of the specific characters that are of interest and match them up with codepages look up tables to be sure. You can do this with a text editor that can show / select codepages and preferably a hex view – A combination of ue.exe (StreamServe UTF Editor) and UltraEdit (my favourite – many others out there though) can help you along here. The codepage tables are best found on the internet.
- So now we need to check if anything is happening to the file when it is sent to StreamServe. Either by StreamServe’s logical printer (port monitor . *.dsi files) / file transfer / http submit and so on.
- If you are using StreamServe’s logical printer then you can halt the current service and send your file. Go to the resulting file delivery path and grab the *.dsi file and move it out of the way as to restart your service if necessary.
- If you have normal file transfer method then you can just grab the resulting delivered file.
- If you have a data stream delivery then you should dump the input into to a file with a little help from a “dump filter” on the input connector. You can read more about that on this ARTICLE.
- Once you have your input file delivered it is time to check that the codepage is still the same and that your special interest characters are still there. If they are not there as before then you have a new codepage (or your editor is showing it to you in another codepage). You will have to scroll through different codepages in order to identify the new codepage. This can happen when setting up the logical printer with additional settings.
- If you still have issues then don’t worry – it is always possible to “convert” with a file filter specific characters that are causing trouble. (Someone please write a nice file filter post here...)
- You can always reposition your dump filter after any other operations in your file filter chain in order to check input.
Well there you go. I hope it helps.
Miscellaneous Tips / links:
- Do not use grab files for development.
- WSIN files are not shown in true codepage – so do not try to validate your language there
- An informative site about codepages