Overview of supported mapping items

Mapping strings support the following types of mapping items:


Item Index


Simple column number

Zero column number

Remaining column number  –  +

Forward column range

Backward column range

End-relative column references – *

References to result string

Implied Column References

Inserted text

Inserted hex text  –  'nn'X

Inserting literal quotes  –  SQ / DQ / AQ

Inserted text ranges  –  [x–y]

String sets and string pairs  –  'A=B' /  'A'='B'  /  'A':'B'

Deleting text  –  D / DC / RA / RL / RR / M

Repeating text – RP

Case modification codes  –   <   >   <>   ><

Case alteration commands  –  UC / LC / SC / TC / IC

Adjusting the default CASE processing  –  CC / CT

Alignment items – L / R / C

Trimming items – T / TL / TR

Padding items – P / PL / PR

Zero-suppression items – Z

Character-replacement items – RC

Character-conversion items – AX / XA / EX / XE

Numeric-conversion items – DD / DX / XD / XX

Sequence  items – SD / SX

Exchange items – X

Character-move items – M



Note that for modifying commands like case alteration, alignment, trimming, padding and zero-suppression items, these operations normally work on the complete contents of the result string.  If you wish to perform one of these actions on a specific part of the result string, that must first be done using a column-reference item on that command.


For instance, to copy the first 4 columns of the source string and then change that to upper case, you would use M"1-4 UC".  But if you wanted to copy the entire source string to the result string (however long it may be),  and then make only the first 4 columns of the result string upper case, you would use M"1+ UC1-4".


To illustrate, if your source string contained "abcdwxyz", a mapping string of M"1-4 UC" would give a result string of "ABCD", while a mapping string of M"1+ UC1-4" would give a result string of "ABCDwxyz".  In the first case, you copied only four columns (1-4) and upper-cased all of them, while in the second case, you copied all of the columns (1-8 in this case) but only upper-cased four of them.


You can specify a remaining column-number item of 1+ to select the entire source string.  Using 1+ allows the full source string to be selected without needing to know its size ahead of time.  For example, to perform a zero-suppression on a numeric string that had comma digit-separators, leaving the last 3 positions, you might use a mapping string of M"1+ Z','3".


However, most commands operate in either auto-reference or in auto-copy mode, so that you often will not need to explicitly reference or copy the source string.  If the result string is empty (as it always is at the beginning of your commands in the mapping string), the Z command above can be shortened to M"Z','3".  When you do this, the Z command detects that the result string is initially empty, and first copies the source string into the result string before proceeding.



Simple column number.  This is a single decimal number.  Leading zeros are allowed and ignored.  This selects one character position from the source string.



Zero column number.  A column number value of zero has a special meaning.  When specified as 0, it refers to all character positions that were not otherwise specified in the mapping string, concatenated together left-to-right as a single value.  When specified as \0, it refers to all character positions that were not otherwise specified in the mapping string, concatenated together right-to-left in reverse order as a single value.  A \ backslash prefix is not allowed on simple column numbers other than \0.  

For example, if your data  was a date formatted as 12/31/2015, and you extracted all of the digits using a mapping string of M"1-2 4-5 7-10", that would produce a result string value of "12312015", but columns 3 and 6 (where the slashes are) would not have been referenced.  If you needed to “gather” all the data you had not referenced up to that point, you would use a column of 0 to get all of them.  The mapping string to do that would be M"1-2 4-5 7-10 0" and the result string value would be "12312015//".  The 0 notation is thus a shortcut so you don't have to write all the places you didn't use.  It basically means, “everything else”.


In order to determined “all character positions that were not otherwise specified in the mapping string”, all column references in the mapping string to the left of where the 0 or the \0 appears are used, as of that point.  If you follow the 0 or the \0 with additional column references, the 0 or the \0  will not have a complete list of columns used (because the processing does not look to the right of 0 or \0 to keep track of referenced columns).  This means that, in most cases, if you do this you will not get very useful results.  For all but very specialized applications, you will want to make the 0 or the \0 appear as the last column reference in your mapping string, if you use it at all.



Remaining column number   +.  This is a single decimal number followed by a + plus sign.  Since 0 is not a valid column number, an item of 0+ is illegal.  A remaining column number refers to that column number and all columns to the right of it, up to and including the last position of the data string value.  







Forward column range.  This is two decimal numbers separated by a dash character, such as 1-4.  There can be no spaces or delimiters between these two numbers other than the dash.  The item is in the format lowhigh, and character values are taken from the particular found string in forward order from left to right.  Neither column numbers can be 0.



Backward column range.  This is two decimal numbers separated by a dash character, such as 4-1.  There can be no spaces or delimiters between these two numbers other than the dash.  The item is in the format highlow, and character values are taken from the particular found string in reverse order from right to left.  Note.  Since a backward column range is already in reversed order, it is unnecessary and invalid to combine a \ backslash prefix with a backward column range item.  That is, use 20-10 rather than \10-20.  Neither column numbers can be 0.



End-relative column references   *.  Regular column references assume a character enumeration of 1, 2, 3 … going from left to right.  End-relative column references enumerate characters going from right to left as … 3, 2, 1, by putting an * asterisk after a column number or column range.  See the full discussion of end-relative column references, and the related topic of mixed-mode end-relative column references for more information.



References to the result string.  When a mapping command requires a column-reference operand, you may use . dot notation.  When this is done, instead of referencing any character locations in the source data string, the current, full contents of the result string are referenced instead.  If the . dot symbol precedes a column reference, like .1-4, that column reference is used to select a substring from the current contents of the result string.  Note:  If you use dot notation before any result string characters have been defined, in most cases the auto-copy and auto-reference features will populate the result string with a copy of the source string.



Implied column references.  When you use a copying command like AX, it requires a reference to the source string, or to the result string via a dot notation.  When you use a modifying command like UC it will modify the entire result string, unless you provide a column range for portion of the result string that you wish to upper case.  For such commands, if you omit a column reference, the following assumptions will be made:




The Quick Reference shows you for each command whether it is a copying command or a modifying command.



Inserted text.  An inserted text item consists of quoted text, enclosed by any valid SPFLite quote, such as "abcd".  Inserted text strings are appended to the end of the result string.  







Inserted hex text 'nn'X.  This serves the same purpose as regular inserted text, but you specify the value as two or more hex digits 0-9, A-F and a-f.  Any underscores inside of inserted hex text are ignored; the remaining characters must be an even number of valid hex digits.  In a mapping string, an inserted hex text literal must be identified only with a trailing X or x code, like '77'X, not like X'77'.  If you needed to insert a literal quote as data which was the same as the outer quotes used on the M string, you could use a hex text value.  For instance, if your M string were enclosed in ( " ) double quotes, you could insert a double-quote literal using '22'X, the ASCII value of a double-quote.

 

Inserted hex text can have a replication factor as a suffix on the string.  A literal of '41'X3 will produce the same value as '414141'X, which is 'AAA'.  This feature only works for “stand-alone” text to be inserted into the result string; you cannot use a replication factor on a string that is an operand of a command code.



Inserting literal quotes SQ / DQ / AQ.  Because it can be awkward to insert the quote characters that are used to enclose the outer M string, there are three command codes that will do this, which are SQ, DQ and AQ.  These will insert a Single Quote, Double Quote or Accent Quote, respectively.  You can place a replication factor on these commands; AQ2 will produce `` two accent quotes.  

Note:  Literal-quote command codes are neither “copying commands” nor “modifying commands” in the usual sense.  They simply append one or more quotes at the end of the result string.  This means that the auto copy and auto reference features do not apply.  If you wanted to enclose the source string in double quotes, you could do this with a mapping string of  M"DQ 1+ DQ".



Inserted text ranges  –  [x–y].  This is specialized form of inserted text in which one or more pairs of Ansi characters are generated, and appended to the end of the result string.  An inserted text range has the form  [x–y] where x and y are any single Ansi characters.  Example:  [0-9] would produce the same as '0123456789'.   If the range is in reverse collating order, the characters produced are reversed; [9-0] would produce the same as '9876543210'.  You can specify multiple ranges by a comma-separated list within the brackets.  Example:  [A-Z,a-z,0-9].  As long as the correct format is observed, with commas, dashes and brackets in their proper places, the characters of x and y can be anything, including commas, dashes and brackets themselves.  (This will look strange, but it will work.)  If you need to insert a single character within a larger range, create a range item that uses the same character twice.  For example, to create a list of digits in ascending order, then a slash, then the digits in descending order, specify this as [0-9,/-/,9-0]; that will produce '0123456789/9876543210'.  Inserted text ranges do not support replication factors.  (If you needed that, the Repeat command code RP might be useful.)



String sets and string pairs  'A=B' /  'A'='B'  /  'A':'B'.  Certain commands such as RC require a string operand with two values – a first value and a second value, separated by an = equal sign.  When this is specified as a single string, it is called a string set.  For example, the command RC'ABC=def' contains a string set.  When one or both of the values themselves contain = equal sign characters as data, you must specify these as two quoted values using a string pair.  Suppose you wanted the RC values to be A=B and X*Y.  To do that, you specify it with a string pair of  RC'A=B'='X*Y'.  






Deleting text  D / DC / RA / RL / RR / M.  You will usually not need a special command to delete text.  To accomplish a deletion of characters from the source string, you can simply omit any reference to those columns of source data from the mapping string.  However, if do you have some need to delete characters from the result string, this can be done with a Delete mapping item code of D followed by a required column-range.  When you use the D command code, the column-range always refers to character positions within the result string.  You can delete “classes” of characters, such as “all commas, periods and parentheses” by using the Delete Characters command DC.  You can replace or delete a specified string wherever it is located, with the String Replacement commands RA, RL and RR.  You can delete text in the process of moving it from one part of the result string to another, using the Move command M.



Repeating text  RP.  The command code RP will repeat the contents of the result string any number of times.  RP by itself will duplicate the result string.  Example: A mapping string of M"`Abc` RP" will cause the result string to contain AbcAbc.  RP will take a duplication factor.  Example: A mapping string of M"`Abc` RP2" will cause the result string to contain AbcAbcAbc.  The duplication factor indicates how many additional copies of the result string will be made.  If RP appears by itself as the first thing in the mapping string, the result string will get two copies of the source string (due to the auto-copy feature being applied); first, the source string would get automatically copied to the result string, and then RP will repeat the  result string so there are now two copies of it.  Note that the numeric operand for RP is a duplication factor, not a column reference.



Case modification codes   <   >   <>   ><.  Normally, the character casing of the result string, after mapping is completed, is the same as the source string.  If desired, the casing can be altered by a case modification code.  These codes are detected as the mapping string is scanned from left to right.  The codes define a case modification state.  Once a case modification state is defined, that state remains in effect to the end of the mapping string or until altered by another case modification code.  Case modification codes do not apply to inserted text literals; these literal values are always inserted into the data as-is exactly as specified.  A case modification code must be followed by a blank.  Because a case modification code affects commands that follow it, placing a case modification code at the end of a mapping string is legal but will have no effect.  Mapping strings support the following case modification codes:



Consult the Quick Reference to determine how the various command code process an active case modification state.  


Note that a modifying command will auto-copy the source string to the result string if, at the time the command is encountered, there is as yet nothing in the result string.  If there is an active case modification state (other than the “as-is” state that doesn’t change anything), and such an auto-copy occurs, that copied data will have its case modified as you requested.


For example, if your source string contains “°°abc°°” (the “°” symbol represents one blank) and you issue a mapping command of M"> L", at the point the L command is reached, there is as yet nothing in the result string.  Because the L command is a modifying command that will auto-copy, the source string is first copied to the result string before L acts on it.  During that copying process, it is noted that the case modification state of > is in effect, so that when the data is copied, it gets changed to upper case before L acts on it.  Thus, L will see the result string as containing “°°ABC°°, which it then left-justifies as ABC°°°°”.



Case alteration commands  UC / LC / SC / TC / IC.  Five case alteration commands are defined:  UC, LC, SC, TC and IC.  These will alter the current contents of the result string.  Just like their Primary Command and Line Command counterparts, the first four mapping commands alter the character case of the result string, comparable to how those primary and line commands do it (Upper Case, Lower Case, Sentence Case and Title Case).  The code IC is used to Invert Case, the way the case conversion code  >< operates.

Why have these commands, when the case modification codes described above are available?  Case modification codes do allow for precise control over the casing of data selected by specific column ranges.  However, you may need a case alteration command for data extracted from locations you do not know of in advance.  In addition, the command codes SC , TC and IC perform operations that would difficult to accomplish with case modification codes.  SC capitalizes the first “word” of each “sentence”, while TC capitalizes the first “letter” of each “word”.  


These operations depend on the following definitions:  A “letter” is a character whose upper case and lower case values are different.  A “word” is a span of “letters” that begins with the beginning of the string or a blank, and ends with the end of the string or a non-letter character.  A “sentence” is one or more “words”, where the first “word” of a “sentence” is the first “word” of the string, or any “word” preceded by both a period and a space, in that order.  These rules implement an algorithm that works well for properly formatted English text, but if your data is out of the ordinary, TC and SC may or may not suit you.  You should test these codes on some sample data to make sure they meet your needs.



Adjusting the default CASE processing  CC / CT .  Certain mapping items that involve comparisons to character data in the source string are sensitive to the PROFILE CASE setting.  These mapping items are the character replacement command RC, the string replacement commands RA, RL and RR, and the exchange-string command X.  By default, whenever a character comparison takes place, the current PROFILE CASE setting determines how it is done.  If PROFILE CASE C is in effect, comparisons are done as case-sensitive, and if PROFILE CASE T is in effect, comparisons are done as case-insensitive.  There are two ways to override this.




Alignment items   L / R / C.  An alignment item consists of the letter L, R or C followed by an alignment specification.  An alignment item is used to left-justify, right-justify, or center the result string.



Trimming items  T / TL / TR.  A trimming item consists of the codes T, TL or TR followed by an optional trimming specification.  A trimming item is used trim off one or more blanks (or other specified character) from one or both side of the result string.



Padding items  P / PL / PR.  A padding item consists of the codes P, PL or PR followed by a padding specification.  A padding item is used append one or more blanks (or other specified character) on to one or both side of the result string.



Zero-suppression items   Z.  A zero-suppression item consists of the letter Z followed by an optional zero-suppression specification.  A zero-suppression item is used to suppress leading zeros from a numeric value in the result string, and replace them with spaces.



Character-replacement items  RC.  A character-replacement item consists of the code RC followed by a character-replacement specification.  A character-replacement item is used to replace one set of characters, using a character translation table.



Character-conversion items  AX / XA / EX / XE.  A character-conversion item consists of the codes AX, XA, EX or XE, followed by a column-range specification.  A character-conversion item is used to convert between character display-data and their hex-digit equivalents.  For example, the hex equivalent of Ansi '1' is '31'.



Numeric-conversion items DD / DX / XD / XX.  A numeric-conversion item consists of the codes DD, DX, XD or XX, followed by numeric formatting information.  A numeric-conversion item can alter the format of a numeric value by adding or removing leading zeros and by changing radix between decimal and hex.  It is possible to calculate derived values from the initial numeric value, such as constant multiples or offsets from the original number, using a calculation operand.



Sequence  items  SD / SX.  A sequence item consists of the codes SD or SX, followed by a sequence specification.  A sequence item is used to generate a sequence of numbers in decimal or hex in various formats.  When a CHANGE ALL command uses an M string, the first string that is found is assigned an enumeration of 1; the second has an enumeration of 2, etc.  It is possible to calculate derived values from the enumeration, such as constant multiples or offsets from the original sequence number, using a calculation operand.



Exchange items   X.  An exchange item consists of the code X followed by an exchange specification.  An exchange item is used swap two different strings at the same time, such as replacing all occurrences of 'one' with 'two' and vice-versa.



Character-move items   M.  A character-move item consists of the code M followed by a character-move specification.  A character-move item is used to move substrings from one point to another within the result string.




Created with the Personal Edition of HelpNDoc: Free CHM Help documentation generator