Overview of supported mapping items
Mapping strings support the following types of mapping items:
Note that for modifying commands like case alteration, alignment, trimming, padding and zero-suppression items, these operations normally work on the complete contents of the result string. If you wish to perform one of these actions on a specific part of the result string, that must first be done using a column-reference item on that command.
For instance, to copy the first 4 columns of the source string and then change that to upper case, you would use M"1-4 UC". But if you wanted to copy the entire source string to the result string (however long it may be), and then make only the first 4 columns of the result string upper case, you would use M"1+ UC1-4".
To illustrate, if your source string contained "abcdwxyz", a mapping string of M"1-4 UC" would give a result string of "ABCD", while a mapping string of M"1+ UC1-4" would give a result string of "ABCDwxyz". In the first case, you copied only four columns (1-4) and upper-cased all of them, while in the second case, you copied all of the columns (1-8 in this case) but only upper-cased four of them.
You can specify a remaining column-number item of 1+ to select the entire source string. Using 1+ allows the full source string to be selected without needing to know its size ahead of time. For example, to perform a zero-suppression on a numeric string that had comma digit-separators, leaving the last 3 positions, you might use a mapping string of M"1+ Z','3".
However, most commands operate in either auto-reference or in auto-copy mode, so that you often will not need to explicitly reference or copy the source string. If the result string is empty (as it always is at the beginning of your commands in the mapping string), the Z command above can be shortened to M"Z','3". When you do this, the Z command detects that the result string is initially empty, and first copies the source string into the result string before proceeding.
Zero column number. A column number value of zero has a special meaning. When specified as 0, it refers to all character positions that were not otherwise specified in the mapping string, concatenated together left-to-right as a single value. When specified as \0, it refers to all character positions that were not otherwise specified in the mapping string, concatenated together right-to-left in reverse order as a single value. A \ backslash prefix is not allowed on simple column numbers other than \0.
For example, if your data was a date formatted as 12/31/2015, and you extracted all of the digits using a mapping string of M"1-2 4-5 7-10", that would produce a result string value of "12312015", but columns 3 and 6 (where the slashes are) would not have been referenced. If you needed to “gather” all the data you had not referenced up to that point, you would use a column of 0 to get all of them. The mapping string to do that would be M"1-2 4-5 7-10 0" and the result string value would be "12312015//". The 0 notation is thus a shortcut so you don't have to write all the places you didn't use. It basically means, “everything else”.
In order to determined “all character positions that were not otherwise specified in the mapping string”, all column references in the mapping string to the left of where the 0 or the \0 appears are used, as of that point. If you follow the 0 or the \0 with additional column references, the 0 or the \0 will not have a complete list of columns used (because the processing does not look to the right of 0 or \0 to keep track of referenced columns). This means that, in most cases, if you do this you will not get very useful results. For all but very specialized applications, you will want to make the 0 or the \0 appear as the last column reference in your mapping string, if you use it at all.
Remaining column number +. This is a single decimal number followed by a + plus sign. Since 0 is not a valid column number, an item of 0+ is illegal. A remaining column number refers to that column number and all columns to the right of it, up to and including the last position of the data string value.
- For example, a remaining column number of 11+ means column 11 and all columns to the right, up to the final character position of the source string; if the data value were of length 15, 11+ is the same as 11-15. When a remaining column number has a \ backslash prefix, the same columns are referenced, but in reverse order; if the data value were of length 15, \11+ is the same as 15-11.
- The + plus sign must be followed by a blank, or must be the last item in the mapping string. (You cannot follow the + plus sign with another item unless separated with a space, to make sure you have correctly specified your intentions.)
- To select the entire source string, a remaining column number of 1+ should be used. Because of the auto-reference and auto-copy features, it will often not be necessary to explicitly copy the source string to the result string by using 1+ notation.
- When the starting column of the remaining column number notation is 1+, you can omit the 1 if you wish and just specify this as + alone. The examples all show 1+ to make clear what their purpose is.
Forward column range. This is two decimal numbers separated by a – dash character, such as 1-4. There can be no spaces or delimiters between these two numbers other than the dash. The item is in the format low–high, and character values are taken from the particular found string in forward order from left to right. Neither column numbers can be 0.
Backward column range. This is two decimal numbers separated by a – dash character, such as 4-1. There can be no spaces or delimiters between these two numbers other than the dash. The item is in the format high–low, and character values are taken from the particular found string in reverse order from right to left Note. Since a backward column range is already in reversed order, it is unnecessary and invalid to combine a \ backslash prefix with a backward column range item. That is, use 20-10 rather than \10-20. Neither column numbers can be 0.
End-relative column references *. Regular column references assume a character enumeration of 1, 2, 3 … going from left to right. End-relative column references enumerate characters going from right to left as … 3, 2, 1, by putting an * asterisk after a column number or column range. See the full discussion of end-relative column references, and the related topic of mixed-mode end-relative column references for more information.
References to the result string. When a mapping command requires a column-reference operand, you may use . dot notation. When this is done, instead of referencing any character locations in the source data string, the current, full contents of the result string are referenced instead. If the . dot symbol precedes a column reference, like .1-4, that column reference is used to select a substring from the current contents of the result string. Note: If you use dot notation before any result string characters have been defined, in most cases the auto-copy and auto-reference features will populate the result string with a copy of the source string.
Implied column references. When you use a copying command like AX, it requires a reference to the source string, or to the result string via a dot notation. When you use a modifying command like UC it will modify the entire result string, unless you provide a column range for portion of the result string that you wish to upper case. For such commands, if you omit a column reference, the following assumptions will be made:
- For a copying command like AX, it will assume you want to reference and copy the entire source string. So, AX by itself means the same as AX1+. This feature is called auto-reference.
- For a modifying command like UC, it requires some value to be present in the result string, which it then operates on. If a modifying command is issued before anything has been copied into the result string, it is assumed that you wanted to apply that command to the complete value of the source string (since you cannot modify the source string directly). Therefore, when the result string is empty, the modifying command will cause the value of the source string to be automatically copied into the result string, since that is most like what you would need to do, and because the command cannot operate without some data in the result string. (Modifying commands have to have something to modify.) This feature is called auto-copy. If you want some different value in the result string, you will have to explicitly copy it before issuing any modifying commands. The auto-copy feature will not be done if any data has been placed into the result string.
The Quick Reference shows you for each command whether it is a copying command or a modifying command.
- Due to standard SPFLite syntax rules, you must use a quote type other than the one being used to enclose the M string as a whole. For example, if you used double quotes ( " ) to enclose the M string, inserted text items must be enclosed in single quotes ( ' ) or accent quotes ( ` ) but they cannot use double quotes ( " ). To define an inserted text quote-character that is the same as its enclosing quotes, that quote must be doubled, or else you must use the ‘other’ quote type not otherwise being used (in a different string). Quoted string operands in mapping commands do not use an “escape” mechanism like backslashes. See Inserting literal quotes below for more information.
- When a literal like `` appears in a mapping string with no other text before or after it, it represents a zero-length string. Depending on the command where it is used, it may be treated the same as a single blank, it may be ignored, or it may be an invalid string operand.
- As you would expect, it is an error to begin an inserted text string with a quote character without finishing it by the time the end of the M string is reached. Inside an inserted text string, values are literal characters stored in the result string value as-is. When a quoted string is part of a command code, see the discussion of the particular command for more information on how it handles its string operands.
- Case modification codes (discussed below) do not apply to inserted text literals; the literal values are always inserted into the data exactly as specified. Due to the same standard SPFLite syntax rules, you cannot directly insert text that contains the quotes that enclose the M string. You could accomplish this by Inserting literal quotes, or with Inserted hex text, described next.
- Inserted text can have a replication factor as a suffix on the string. A literal of '/'3 will produce the same value as '///'. This feature only works for “stand-alone” text to be inserted into the result string; you cannot use a replication factor on a string that is an operand of a command code.
Inserted hex text 'nn'X. This serves the same purpose as regular inserted text, but you specify the value as two or more hex digits 0-9, A-F and a-f. Any underscores inside of inserted hex text are ignored; the remaining characters must be an even number of valid hex digits. In a mapping string, an inserted hex text literal must be identified only with a trailing X or x code, like '77'X, not like X'77'. If you needed to insert a literal quote as data which was the same as the outer quotes used on the M string, you could use a hex text value. For instance, if your M string were enclosed in ( " ) double quotes, you could insert a double-quote literal using '22'X, the ASCII value of a double-quote.
Inserted hex text can have a replication factor as a suffix on the string. A literal of '41'X3 will produce the same value as '414141'X, which is 'AAA'. This feature only works for “stand-alone” text to be inserted into the result string; you cannot use a replication factor on a string that is an operand of a command code.
Inserting literal quotes SQ / DQ / AQ. Because it can be awkward to insert the quote characters that are used to enclose the outer M string, there are three command codes that will do this, which are SQ, DQ and AQ. These will insert a Single Quote, Double Quote or Accent Quote, respectively. You can place a replication factor on these commands; AQ2 will produce `` two accent quotes.
Note: Literal-quote command codes are neither “copying commands” nor “modifying commands” in the usual sense. They simply append one or more quotes at the end of the result string. This means that the auto copy and auto reference features do not apply. If you wanted to enclose the source string in double quotes, you could do this with a mapping string of M"DQ 1+ DQ".
Inserted text ranges – [x–y]. This is specialized form of inserted text in which one or more pairs of Ansi characters are generated, and appended to the end of the result string. An inserted text range has the form [x–y] where x and y are any single Ansi characters. Example: [0-9] would produce the same as '0123456789'. If the range is in reverse collating order, the characters produced are reversed; [9-0] would produce the same as '9876543210'. You can specify multiple ranges by a comma-separated list within the brackets. Example: [A-Z,a-z,0-9]. As long as the correct format is observed, with commas, dashes and brackets in their proper places, the characters of x and y can be anything, including commas, dashes and brackets themselves. (This will look strange, but it will work.) If you need to insert a single character within a larger range, create a range item that uses the same character twice. For example, to create a list of digits in ascending order, then a slash, then the digits in descending order, specify this as [0-9,/-/,9-0]; that will produce '0123456789/9876543210'. Inserted text ranges do not support replication factors. (If you needed that, the Repeat command code RP might be useful.)
String sets and string pairs 'A=B' / 'A'='B' / 'A':'B'. Certain commands such as RC require a string operand with two values – a first value and a second value, separated by an = equal sign. When this is specified as a single string, it is called a string set. For example, the command RC'ABC=def' contains a string set. When one or both of the values themselves contain = equal sign characters as data, you must specify these as two quoted values using a string pair. Suppose you wanted the RC values to be A=B and X*Y. To do that, you specify it with a string pair of RC'A=B'='X*Y'.
- The only difference between a string set and a string pair is whether there is one quoted value or two. You only need a “pair” if the first or second values contain embedded equal signs as data. It helps with the documentation for us to be able to give each specify format a “name”. Otherwise, these values are used internally by the commands in exactly the same way, since a command is only concerned about the values of first and second, not on the syntax of how they got defined.
- Either or both quoted values of a string pair may optionally contain a string type code of C, T or X. Consult the description for each mapping command that allows string pairs as to how they handle these string type codes. Because there are a large number of combinations of type codes C, T or X (or none) that could be on both sides of the string pair, consult the Quick Reference for how these combinations are treated. It contains a comprehensive table of all combinations of these codes, and how they are processed.
- Both values of a string pair must be quoted, but the left and right sides do not have to use the same kind of quotes.
- When the delimiter between a string pair is a : colon instead of an = equal sign, the command is instructed to treat the string values as both case sensitive and case conformant. This is a shortcut for putting string type T on both sides.
Deleting text D / DC / RA / RL / RR / M. You will usually not need a special command to delete text. To accomplish a deletion of characters from the source string, you can simply omit any reference to those columns of source data from the mapping string. However, if do you have some need to delete characters from the result string, this can be done with a Delete mapping item code of D followed by a required column-range. When you use the D command code, the column-range always refers to character positions within the result string. You can delete “classes” of characters, such as “all commas, periods and parentheses” by using the Delete Characters command DC. You can replace or delete a specified string wherever it is located, with the String Replacement commands RA, RL and RR. You can delete text in the process of moving it from one part of the result string to another, using the Move command M.
Repeating text RP. The command code RP will repeat the contents of the result string any number of times. RP by itself will duplicate the result string. Example: A mapping string of M"`Abc` RP" will cause the result string to contain AbcAbc. RP will take a duplication factor. Example: A mapping string of M"`Abc` RP2" will cause the result string to contain AbcAbcAbc. The duplication factor indicates how many additional copies of the result string will be made. If RP appears by itself as the first thing in the mapping string, the result string will get two copies of the source string (due to the auto-copy feature being applied); first, the source string would get automatically copied to the result string, and then RP will repeat the result string so there are now two copies of it. Note that the numeric operand for RP is a duplication factor, not a column reference.
Case modification codes < > <> ><. Normally, the character casing of the result string, after mapping is completed, is the same as the source string. If desired, the casing can be altered by a case modification code. These codes are detected as the mapping string is scanned from left to right. The codes define a case modification state. Once a case modification state is defined, that state remains in effect to the end of the mapping string or until altered by another case modification code. Case modification codes do not apply to inserted text literals; these literal values are always inserted into the data as-is exactly as specified. A case modification code must be followed by a blank. Because a case modification code affects commands that follow it, placing a case modification code at the end of a mapping string is legal but will have no effect. Mapping strings support the following case modification codes:
- Upper case code > causes subsequent mapped characters to be converted to upper case.
- Lower case code < causes subsequent mapped characters to be converted to lower case.
- Invert case code >< causes the case of subsequent mapped characters to be inverted, so that upper case becomes lower case and vice-versa.
- As-is case code <> causes subsequent mapped characters to be copied as-is without case conversion. By default, all mapping strings are treated as if they began with an initial as-is code of <> so that the case of mapped characters is not altered.
- If a case modification code of > (convert to upper case) or < (convert to lower case) is in effect prior to the AX or EX code, those case modification states will be respected, and hex-digit characters will be rendered accordingly. If a case modification code of > or < is not in effect prior to the AX or EX code, hex-digit characters will be rendered in upper case.
Consult the Quick Reference to determine how the various command code process an active case modification state.
Note that a modifying command will auto-copy the source string to the result string if, at the time the command is encountered, there is as yet nothing in the result string. If there is an active case modification state (other than the “as-is” state that doesn’t change anything), and such an auto-copy occurs, that copied data will have its case modified as you requested.
For example, if your source string contains “°°abc°°” (the “°” symbol represents one blank) and you issue a mapping command of M"> L", at the point the L command is reached, there is as yet nothing in the result string. Because the L command is a modifying command that will auto-copy, the source string is first copied to the result string before L acts on it. During that copying process, it is noted that the case modification state of > is in effect, so that when the data is copied, it gets changed to upper case before L acts on it. Thus, L will see the result string as containing “°°ABC°°”, which it then left-justifies as “ABC°°°°”.
Case alteration commands UC / LC / SC / TC / IC. Five case alteration commands are defined: UC, LC, SC, TC and IC. These will alter the current contents of the result string. Just like their Primary Command and Line Command counterparts, the first four mapping commands alter the character case of the result string, comparable to how those primary and line commands do it (Upper Case, Lower Case, Sentence Case and Title Case). The code IC is used to Invert Case, the way the case conversion code >< operates.
Why have these commands, when the case modification codes described above are available? Case modification codes do allow for precise control over the casing of data selected by specific column ranges. However, you may need a case alteration command for data extracted from locations you do not know of in advance. In addition, the command codes SC , TC and IC perform operations that would difficult to accomplish with case modification codes. SC capitalizes the first “word” of each “sentence”, while TC capitalizes the first “letter” of each “word”.
These operations depend on the following definitions: A “letter” is a character whose upper case and lower case values are different. A “word” is a span of “letters” that begins with the beginning of the string or a blank, and ends with the end of the string or a non-letter character. A “sentence” is one or more “words”, where the first “word” of a “sentence” is the first “word” of the string, or any “word” preceded by both a period and a space, in that order. These rules implement an algorithm that works well for properly formatted English text, but if your data is out of the ordinary, TC and SC may or may not suit you. You should test these codes on some sample data to make sure they meet your needs.
Adjusting the default CASE processing CC / CT . Certain mapping items that involve comparisons to character data in the source string are sensitive to the PROFILE CASE setting. These mapping items are the character replacement command RC, the string replacement commands RA, RL and RR, and the exchange-string command X. By default, whenever a character comparison takes place, the current PROFILE CASE setting determines how it is done. If PROFILE CASE C is in effect, comparisons are done as case-sensitive, and if PROFILE CASE T is in effect, comparisons are done as case-insensitive. There are two ways to override this.
- First, commands that take strings which are being compared can have a C, T or X suffix. If a string has a suffix of C or X, strings are compared as case-sensitive, and if a string has a suffix of T, strings are compared as case-insensitive.
- Second, if a command string does not have a C, T or X suffix, you can use the case-override items CC and CT to override the PROFILE CASE default. If the command CC appears in the mapping string, comparisons for any command that follows it are done in case-sensitive mode, as if your PROFILE was set to CASE C. If the command CT appears in the mapping string, comparisons for any command that follows it are done in case-insensitive mode, as if your PROFILE was set to CASE T. The CC and CT commands are handled left-to-right as they appear in a mapping string, and remain in effect for that string unless followed by another CC or CT command. You can use the CC or CT command to set the case-handling just once, rather than having to use string type codes everywhere commands appear in your mapping string. The CC and CT commands do not take any string or number operands.
Alignment items L / R / C. An alignment item consists of the letter L, R or C followed by an alignment specification. An alignment item is used to left-justify, right-justify, or center the result string.
Trimming items T / TL / TR. A trimming item consists of the codes T, TL or TR followed by an optional trimming specification. A trimming item is used trim off one or more blanks (or other specified character) from one or both side of the result string.
Padding items P / PL / PR. A padding item consists of the codes P, PL or PR followed by a padding specification. A padding item is used append one or more blanks (or other specified character) on to one or both side of the result string.
Zero-suppression items Z. A zero-suppression item consists of the letter Z followed by an optional zero-suppression specification. A zero-suppression item is used to suppress leading zeros from a numeric value in the result string, and replace them with spaces.
Character-replacement items RC. A character-replacement item consists of the code RC followed by a character-replacement specification. A character-replacement item is used to replace one set of characters, using a character translation table.
Character-conversion items AX / XA / EX / XE. A character-conversion item consists of the codes AX, XA, EX or XE, followed by a column-range specification. A character-conversion item is used to convert between character display-data and their hex-digit equivalents. For example, the hex equivalent of Ansi '1' is '31'.
Numeric-conversion items DD / DX / XD / XX. A numeric-conversion item consists of the codes DD, DX, XD or XX, followed by numeric formatting information. A numeric-conversion item can alter the format of a numeric value by adding or removing leading zeros and by changing radix between decimal and hex. It is possible to calculate derived values from the initial numeric value, such as constant multiples or offsets from the original number, using a calculation operand.
Sequence items SD / SX. A sequence item consists of the codes SD or SX, followed by a sequence specification. A sequence item is used to generate a sequence of numbers in decimal or hex in various formats. When a CHANGE ALL command uses an M string, the first string that is found is assigned an enumeration of 1; the second has an enumeration of 2, etc. It is possible to calculate derived values from the enumeration, such as constant multiples or offsets from the original sequence number, using a calculation operand.
Exchange items X. An exchange item consists of the code X followed by an exchange specification. An exchange item is used swap two different strings at the same time, such as replacing all occurrences of 'one' with 'two' and vice-versa.
Character-move items M. A character-move item consists of the code M followed by a character-move specification. A character-move item is used to move substrings from one point to another within the result string.
Created with the Personal Edition of HelpNDoc: Free Kindle producer