Working with Unicode Mapping Files

Contents of Article


Introduction

How the Unicode Mapping File is defined

How the Unicode Mapping File works

A sample Unicode Mapping File application

Future plans for Unicode Mapping Files


Introduction


If you have used one of the "Raster" fonts supplied with SPFLite, you may have noticed several "Ansi" characters that were not really Ansi at all.  For instance, in location hex 8D you will see a    less-than-or-equal sign.  That is not Ansi, but it is the same as the Unicode character U+2264.  


As you may know, SPFLite does not directly operate with Unicode data, but only with Ansi.  Internally, all data that SPFLite deals with is Ansi data.  So how is it that a    less-than-or-equal sign can appear in this position?


The reason is that the supplied Raster fonts were custom-designed for SPFLite, and the extra characters you see occupy "unofficial" (unassigned) positions.  SPFLite does not know or care what a byte of hex 8D means.  It simply displays it, and allows the font definition to control what visually appears on your screen.


All that is fine, if the only thing you do is display your data.  But what if you want to print it?  Previously, SPFLite would simply send the Ansi data to your printer.  When Windows receives this data, it internally converts the data as needed to Unicode.  For example, in the Microsoft 1252 code page used for Ansi, the Euro symbol appears at location hex 80, while its Unicode value is U+20AC.  If you have a Euro symbol in the data you print, the hex 80 value will eventually get converted to hex 20AC so that your printer will recognize it, and the right character is printed.  Windows can do this because it "knows" what Ansi characters mean in terms of their Unicode definitions, which are standardized and well-known.


The case where a character of hex 8D is used is more complicated.  That is because there is no mapping for hex 8D.  There isn't any, because we made it up It is a "private definition" that exists only within SPFLite.  We added it because people often want to use characters like , and in their data, but plain Ansi doesn't provide them.  In the past, SPFLite would have transmitted a hex 8D to your printer - but the printer would not know what to do with it, and it would either print it as a blank or as nothing at all (causing the line it was on to appear shifted over), or the printer might have printed some 'garbage' character instead.


The problem with printing hex 8D as a    less-than-or-equal sign is not that this character is "unknown" or" illegal".  It's just that it is not known as 8D.  Somehow, we need to be able to convey to the printer that when an 8D is encountered, we really want to print U+2264.


By providing a Unicode Mapping File, these unusual characters can be printed just like any regular data.


How the Unicode Mapping File is defined


A Unicode Mapping File is an ordinary text file that is located in the same SPFLite directory where your SPFLite.INI file exists.  You will have to prepare and save this file into that directory yourself.  A sample file appears below.


The mapping file must have one of the following names:


SPFLite.Unicode

SPFLite.Print.Unicode


For now, you can just use the short name, SPFLite.Unicode.  See Future plans for Unicode Mapping Files for more information.


If neither of these files exist in the SPFLite directory, SPFLite will handle printing the same as previously, with no character mapping performed.  It is not an error if one of these files does not exist, and no message will appear.


A Unicode Mapping file contains the following elements:



A mapping line has the following format:


Xnn=Unnnn


The left side with Xnn defines the Ansi character you wish to map.  Let's say we want to map the less-than-or-equal character.  So, the Xnn would appear as X8D.  The right side with Unnnn defines what Unicode character you want to print.  The Unicode value for this character is U2264.


The + plus sign, used to describe Unicode in formal publications, is not used here.  Just say U2264, not U+2264.


You must specify exactly two hex digits for the Xnn side, and exactly four digits for the Unnnn side.


On a mapping line, blank characters are ignored.  If you wish, you can place an optional comment on a mapping line, by following the Unnnn part with a ; semicolon and your comment.  Here is an example of the less-than-or-equal mapping line, with a comment:


X8D=U2264; less than or equal


In case the same Xnn value appeared more than once in the same Unicode Mapping File, the final one will override any prior entries.  You can use this fact to store alternative mappings in your file.  You would place the mappings you really want as last in your file, and prior ones would essentially be treated as comments.  That way you can keep the unused, alternative values in your file without deleting them.


If any format errors are found in your Unicode Mapping File, an error message is displayed, and SPFLite will not use your mapping file, but will print your data file the same as previously, without any character mapping performed.


How the Unicode Mapping File works


When you request SPFLite to send data to the printer, it looks for a Unicode mapping file in the SPFLite directory, where the SPFLite.INI file is located.  It will first look for a file called SPFLite.Print.Unicode, and will use it if one is there; otherwise it will look for a file called SPFLite.Unicode.


When a Unicode Mapping File is found, the entries are validated and stored into memory.  From that point on, any print requests will be transmitted to your printer directly in Unicode, rather than allowing Windows to convert your data to Unicode itself.


The trick here is that by defining your own Unicode Mapping File, you decide which specific Unicode value is sent for any given Ansi character.


It is not necessary to map every Ansi character to Unicode.  Only map the characters that you really need to change from the usual definitions.  For instance, the digit 5 is 35 in Ansi and U+0035 in Unicode.  There is no reason to map such a character - unless you're doing something very unusual.  So, even though a Unicode Mapping File could theoretically have 256 unique mappings, you are not likely to ever actually specify that many.


While these unusual characters will likely only appear on your screen in the fonts provided with SPFLite (the Raster screen fonts), you can use any True Type font to print with, provided it has the necessary characters.  Because real True Type data is being sent to your printer, you are not limited to a printer font like RasterTTF or our new RasterTN True Type font.  For instance, a Microsoft fixed font called Consolas may be used to print these characters.  You can also use Courier New, although you will need to confirm that this font has all the characters you might need.  There may be many other font choices available to you.


You can use the Windows utility charmap to see what characters exist in a given font, or you could use a program like Microsoft Word, and select Insert > Symbol to see what is available.  These tools will also show you the Unicode hex value you will need for your Unicode Mapping File.


A sample Unicode Mapping File application


Here is a display of our revised Raster15 TN screen font.  It is similar to the prior Raster15 font, but the extra characters you see are designed to make all of the symbols on an IBM 1403 printer with a TN print chain available.  The particular placement of characters is dictated by the available free positions, and some values don't work well as printable data, like CR,LF and ESC.


You may also notice that the added characters don't include superscript 1, 2 and 3, because they already existed in Ansi.  You now have the full set of superscripts from 0 to 9, just not in adjacent locations.


These new screen fonts, and the new RasterTN True Type font, will be made available on the SPFLite web site about the same time that the Unicode Mapping File support is released.  Check the web site for more information.




You will notice that the "unusual" characters are in the Ansi range of hex 00 to hex 1F, and a few in the range of hex 80 to 9F.


Here is a sample SPFLite.Unicode mapping file that will allow you to print this data.


[Unicode]


X00=U0020;        null as space

X01=U2502;        box vertical

X02=U2534;        box T up

X03=U252C;        box T down

X04=U2191;        arrow up

X05=U2193;        arrow down

X06=U2192;        arrow right

X07=U2190;        arrow left, note that ASCII 07=BEL

X08=U0020;        control BS as space

X09=U0020;  control HT as space

X0A=U0020;  control LF as space

X0B=U0020;  control VT as space

X0C=U0020;  control FF as space

X0D=U0020;  control CR as space

X0E=U251C;  box T right

X0F=U2524;  box T left

X10=U2070;  superscript 0

X11=U207A;  superscript +

X12=U207D;  superscript (

X13=U207E;  superscript )

X14=U2074;  superscript 4

X15=U2075;  superscript 5

X16=U2076;  superscript 6

X17=U2077;  superscript 7

X18=U2078;  superscript 8

X19=U2079;  superscript 9

X1A=U0020;  control EOF as space

X1B=U0020;  control ESC as space

X1C=U2514;  box LL

X1D=U2518;  box LR

X1E=U250C;  box UL

X1F=U2510;  box UR

X7F=U0020;  control DEL as space

X97=U2500;  box horizontal

X81=U253C;  box intersection

X90=U25AA;  black square

X8D=U2264;  LE

X8F=U2260;  NE

X9D=U2265;  GE

XA0=U0020;  non-breaking space

XAF=U207B;  superscript - 00AF may be OK too


Notice that if you had data with embedded CR and LF characters (say, if your file was RECFM F and EOL NONE), that might be valid as screen data while editing, but your printer may object.  In addition, our Raster screen fonts have the special  CR  and  LF  symbols as single characters, but most printer fonts do not have these defined.  To avoid such data causing problems for your printer, you should map these control codes to space (U0020) so that it won't interfere with the printer's normal operation.


If you have a regular text file, rather than one with EOL NONE, SPFLite takes care of the line termination for you.  Don't worry - even though you might have CR and LF mapped to U0020 as shown above, this won't change how a regular text file is printed.  You'd only get CR and LF changed to blanks if they were data values in your file (in an EOL NONE file), but not when they are part of ordinary text file lines.


Future plans for Unicode Mapping Files


In theory, based on the way we have designed the naming conventions for Unicode Mapping Files, it is conceivable that some day we could add support for mapping of screen fonts, in addition to printer fonts.  If this were done, you could have one common mapping file for both, using a file of SPFLite.Unicode, or if the mappings were different, you could have an SPFLite.Print.Unicode and an SPFLite.Display.Unicode.  That would make it possible to use any fixed font as a display font (rather than just our special Raster screen fonts), and set aside certain special characters as alternative Ansi codes.


In theory, we could also tailor the Unicode mapping of files differently, depending on their file type, by adding some kind of enhanced PROFILE support.  If that were done, the default prefix of SPFLite on a .Unicode file would be changed to something else.


Such changes would be more involved than the basic Unicode mapping support we have presently added for printers.  We are not making any promises as to if or when support for these features might be added.  Any enhancements that are made will be announced on the SPFLite forum if and when they become available.


Created with the Personal Edition of HelpNDoc: Easily create Qt Help files