Leave feedback
  • Question

    sls files and codepage - what's really going on?

Enter a new topic
  • Gaute Amundsen Gaute Amundsen
    0 likes 1222 views

    I have been struggling with a big sls file today, and it just seems so horribly hit and miss.
    Does anybody know *exactly* what's going on?

    In roughly this context:
    http://www.joelonsoftware.com/articles/Unicode.html

    Working backwards.

    Is strs doing any detecting / guessing of codepages BOM's etc?
    Or does it just assume an sls is utf8 x-endian, CRLF line terminators?

    What does Utf8Edit actually do when you change codepage and/or "reorder text"?
    It seems to me that changing codepage only changes the display of text, and input of new, and will not change the file in any way by itself?
    "Reordering".. could be changing the byte order.. who knows?

    Utf8Edit knows when to display "?" or the little [] box.
    It seems to me it should be possible to write a little utility that at least can tell you IF and maybe where there are problem-characters in your file.

    Wednesday 19 March, 2014
  • Christofer Lindqvist Christofer Lindqvist StreamServe Employee
    0 likes

    Hi,

    As far as i know...

    StreamServe use UTF8 internally, so all tables that are used should be in UTF8 codepage and they should also contain the first line saying //Codepage UTF8 (or something similar).

    The SLS file should be specified according to the predefined format.

    UTF8Edit is only a texteditor which allows you to change to codepage and save the file in different codepages. You can use it to verify that a file is a certain codepage etc. It will not convert your file from one codepage to another.

    Regards,
    Christofer

    Friday 28 March, 2014
  • Gaute Amundsen Gaute Amundsen
    0 likes

    As far as I understand it the first line should contain a //!Codepage header that lets STRS know how to decode whatever into utf8. Strictly speaking writing "//!Codepage iso-8859-1 should be perfectly fine as long as the file actually IS iso-8859-1. Other than that I believe STRS may balk at utf8 files with correct Codepage headers if they have a BOM, or if there are a singe line separator other than CRLF. Although I have not tested this extensively.

    When you say "save the file in different codepages" and "It will not convert your file" is not that contradictory statements?

    I just did some tests using the unix "file" and "diff" utilities:
    Opening a utf8 file, changing codepage to iso-8859-1 does precisely nothing. Even when adding a line to trigger a proper save. ( there is a diff of course)

    Selecting "reorder text" or, the presence or absence of a Codepage header makes no difference.

    Changing to some other encodings however like, "Adobe Standard Encoding", or "Unified Hangeul KSX1001" changes the file right away.
    It's inconsistent in other words.

    I wish I could say "it depends", but then I would have to know what that dependence was..

    Regarding input, then the codepage setting of the editor makes a difference as I suspected.
    Saving a file after changing utf8->iso88591 makes no difference, but adding øæå, and saving makes the file be detected as  ISO-8859.
    Adding the exact same letters while the editor is in utf8 mode keeps the file detected as that.

    Wednesday 09 April, 2014
  • Christofer Lindqvist Christofer Lindqvist StreamServe Employee
    0 likes

    Not really contradictory i think :)

    Save the file in different codepages, meaning that it will take whatever you have, set a new codepage, and save it. The saved file can be unreadably after that. It only saves the file you have in a new codepage.

    Converting would be that whatever codepage you choose to have on your file, it would be converted to that codepage and saved in a correct format according to that codepage. So UTF8Edit doesnt convert the file you have, it only let you change codepage of the file. The file might get unreadable after changing codepage.

     

    Wednesday 09 April, 2014
  • Gaute Amundsen Gaute Amundsen
    0 likes

    You write as if "codepage" is an external attribute of a file, like permissions or modtime.
    I think you are misunderstanding something. http://en.wikipedia.org/wiki/Code_page

    As I wrote above, when you change the codepage in Utf8Edit, most often NOTHING is changed.
    Not a single bit.

    Wednesday 09 April, 2014
  • Gaute Amundsen Gaute Amundsen
    1 likes

    About BOMs, EOLs and such.

    Just did some further tests, and these are the results for the record.

    Editing and saving a working .sls file in UltraEdit seems to be no problem.
    Adding letters like æøå and all.

    If you change UE's config to write a "utf8 bom", things break.
    The log shows the same number of "loaded items", but the rendering is broken.

    I guess guess this is because STRS does not understand it, reads it as garbage, and then misses the "//!CodePage UTF8!" declaration. The rendering breaks the same way as if the codepage declaration is just removed.

    This highlights BTW, that even while STRS internal format may be utf8, the default input format can not be.

    Line endings.
    Surprisingly STRS seems completely happy with both CRLF, LF, or even a mix. It all works :)

    Wednesday 09 April, 2014