Leave feedback
  • Featured

    Troubleshooting Codepages (Character Sets)

Write an Article
Monday 26 April, 2010
Vyv Lomax Vyv Lomax Administrator
6 likes 6310 views

Problems with codepages (character encoding) come 10 a penny – so here is an article to read before you cry for help.

Introduction

  1. If you see funny characters like ? / 伀 / ⴀ / 一 instead of the € / é / å / á / ó / ń / í / ł / ź / ć / ą / ę / ś / Д / И / Б / ا / ط / غ that you should - then you have codepage issues.
  2. If a document system suddenly stops working where is was perfectly fine before – then you may have introduced a new character into a sensitive part of the processing. E.g. An existing trigger label gets a new language and therefore does not match its trigger value. This is also, of course, a codepage issue.
  3. If you introduce the € Euro currency into your system and it is not coming out on the printer – then you may have codepage issues.

These are likely to happen when:

  1. Someone adds a new foreign speaking trading partner to your application.
  2. Someone adds a new language to the current system.
  3. Someone cuts and pastes text in one language from a web site into a database field in your application.
  4. Some administrator changes fonts / printers / database settings / enables extended language settings somewhere along the line.

Well – do not worry – there is a simple process to get it working again. But before that we should learn a little something about codepages.

Good things to know are:

  1. A file / data stream can only have one codepage at any one time.
  2. Printers and fonts also need to support your codepages.
  3. StreamServe cannot guess which codepage you are sending into it.
  4. Your StreamServer server has a default codepage that is used if none other is specified (installed with the operating system).
  5. Your StreamServe project will use the default codepage of the server (see previous point) if none other is specified (say on the input analyzer or in a filter).
  6. PDF format is very forgiving and many printing issues are not found when developing with PDF.
  7. UE.exe is a simple UTF Editor provided by StreamServe with all supported codepages in it - and is available in all versions of StreamServe. (Windows only).
  8. Lookup tables & SLS files in StreamServe should be created in UTF-8 codepage.
  9. Finally - There are many codepages out there with different names and standards and levels of quality. Read a bit more about it on the internet if you want to.

So to the process to get it back working:

  1. Try to obtain the name of the codepage (from an administrator) that your application is producing its datastreams' in.
  2. Send your output to a file and view it – does it look OK here? If not then you need to go back to your system administrator and reconfigure your application.
  3. If it does look OK then you should confirm the codepage – take a look at the hex values of the specific characters that are of interest and match them up with codepages look up tables to be sure. You can do this with a text editor that can show / select codepages and preferably a hex view – A combination of ue.exe (StreamServe UTF Editor) and UltraEdit (my favourite – many others out there though) can help you along here. The codepage tables are best found on the internet.
  4. So now we need to check if anything is happening to the file when it is sent to StreamServe. Either by StreamServe’s logical printer (port monitor . *.dsi files) / file transfer / http submit and so on.
  5. If you are using StreamServe’s logical printer then you can halt the current service and send your file. Go to the resulting file delivery path and grab the *.dsi file and move it out of the way as to restart your service if necessary.
  6. If you have normal file transfer method then you can just grab the resulting delivered file.
  7. If you have a data stream delivery then you should dump the input into to a file with a little help from a “dump filter” on the input connector. You can read more about that on this ARTICLE.
  8. Once you have your input file delivered it is time to check that the codepage is still the same and that your special interest characters are still there. If they are not there as before then you have a new codepage (or your editor is showing it to you in another codepage). You will have to scroll through different codepages in order to identify the new codepage. This can happen when setting up the logical printer with additional settings.
  9. If you still have issues then don’t worry – it is always possible to “convert” with a file filter specific characters that are causing trouble. (Someone please write a nice file filter post here...)
  10. You can always reposition your dump filter after any other operations in your file filter chain in order to check input.

Well there you go. I hope it helps. 

Miscellaneous Tips / links:

  1. Do not use grab files for development.
  2. WSIN files are not shown in true codepage – so do not try to validate your language there
  3. An informative site about codepages

Comments (5)

  • If you are having issues with Movex and different languages - please try to figure out if it is sending UCS2 (Little or Big Endian). Once it comes to your Logical Printer it may be exactly the same or suddenly the codepage of your server. This depends if you have set up the port monitor / logical printer correctly.

    Monday 26 April, 2010 by Vyv Lomax
  • Another interesting thing is that the StreamServer uses the UCS2 (doublebyte) encoding internally for all data. This means that all input has to be converted from their native format to UCS2 in the input pipeline and then converted from this common format to an output codepage again in the output pipeline. These conversions are done by the codepage filters you create in Design Center.

    XML has a mechanism in place to declare what codepage is used for a particular instance document. The XML declaration has a codepage attribute that tells parsers how to interpret the document. Ex:

    <?xml version="1.0 codepage="ISO-8859-1?>

    The XML declaration is optional and a document must be in UTF-8 if it omits the declaration. The StreamServe XMLIN Agent uses this declaration to insert the appropriate codepage filter in the input pipeline.  This is done automatically, so you don't need to create a codepage filter in DC when you use the XMLIN agent as it is ignored by this agent type.

     

    Monday 26 April, 2010 by Stefan Cohen
  • I messed up the XML declaration in the previous example. This should be it:

    <?xml version="1.0" encoding="iso-8859-1"?>

    i must remember to not post late at night ;)

    Tuesday 27 April, 2010 by Stefan Cohen
  • http://streamshare.streamserve.com/Forum/Topic/?topicID=1109

    Is a post with a related problem & solution.

    Wednesday 28 April, 2010 by Vyv Lomax
  • A Knowledge Center article on Asian Codepages.

    Sunday 25 November, 2012 by Vyv Lomax

   


Post comment