SPLIT - Split Lines Using Find/Change Strings

Contents of Article


Syntax

Operands

Description

SPLIT and virtual highlighting pens

Splitting lines with labels and tags

Splitting strings vs. splitting lines

Performing splits more complex than SPLIT can support

Splitting zero-length lines

Line splitting and line exclusion

Using ALL FIRST and ALL LAST on SPLIT

Using ALL FIRST and ALL LAST elsewhere

A note about the IBM ISPF legacy SPLIT command


Syntax


SPLIT

from-string  

{ P'to-string' | F'to-string' }

[ start-column [ end-column ] ]

[ FIRST | LAST | NEXT | PREV | ALL ]

[ PREFIX | SUFFIX | WORD | CHAR ]

[ LEFT | RIGHT ]

[ line-control-range ]

[ color-selection-criteria ]

[ X | NX ]

[ U | NU ]

[ TOP ]


Operands


from-string

The search string you you use to describe the text to be split.  This text, when found, is discarded and replaced with the to-string, described next.


P'to-string' |

F'to-string'

The string you want to replace from-string, which must be defined as a P-type Picture string or an F-type Format string, which contains one, and only one, vertical bar | character to define the split-point.


start-column

Left column of a range (with end-column) within which the from-string value must be found. If no end-column operand, then the from-string operand must be found starting in start-col.


end-column

Right column of a range (with start-column) within which the from-string value must be found.


FIRST

Starts at the top of the data and searches ahead to find the first occurrence of from-string.  See the discussion below about using ALL FIRST and ALL LAST on SPLIT.


LAST

Starts at the bottom of the data and searches backward to find the last occurrence of from-string.  See the discussion below about using ALL FIRST and ALL LAST on SPLIT.


NEXT

Starts at the first position after the current cursor location and searches ahead to find the next occurrence of from-string.  NEXT is the default.


PREV

Starts at the current cursor location and searches backward to find the previous occurrence of from-string.


ALL

Starts at the top of the data and searches ahead to find all occurrences of from-string.  See the discussion below about using ALL FIRST and ALL LAST on SPLIT.


LEFT

LEFT causes the search-string to be found at most once in any given line.  Where the search-string occurs more than once in the same line, only the left-most occurrence of search-string is changed, and any other instances on that same line are unchanged.


RIGHT

RIGHT causes the search-string to be found at most once in any given line.  Where the search-string occurs more than once in the same line, only the right-most occurrence of search-string is changed, and any other instances on that same line are unchanged.


PREFIX

Locates from-string at the beginning of a word.


WORD

Locates from-string when it is delimited on both sides by blanks or other non-alphanumeric characters.


CHAR

Locates search-string regardless of what precedes or follows it.


SUFFIX

Locates from-string at the end of a word.


line-control-range

The range of lines which are to be processed by the command.  Line control ranges provide a powerful tool to customize the range of lines to be processed.   The full syntax and allowable operands which make up a line control range are discussed in "Line Control Range Specification".  Refer to that section of the documentation for details.


color-selection-criteria

A request for selection based on the highlight color of the from-string. Color requests provide another powerful tool to control search selection.   The full syntax and allowable operands which make up a color-selection-criteria  are discussed in "Color Selection Criteria Specification".   Refer to that section of the documentation for details.


X | NX

Specifies a subset of the line range to be processed.   X requests only excluded lines are to be examined, NX requests only non-excluded lines are to be examined.   If neither X or NX are specified, all lines in the range will be examined.


U | NU

Specifies a subset of the line range to be processed.   U requests only User lines are to be processed, NU requests only non-User lines are to be processed.   If neither U or NU are specified, all lines in the range will be processed.


TOP

Normally, at the completion of the command, the first, or only, line processed is highlighted (if it is on the current screen) or the screen is scrolled to the 2nd screen line (as ISPF does) if the line is not on the current screen.  If TOP is coded, then the line is always positioned as the top line of the screen, regardless of its current location.


Abbreviations and Aliases

PREFIX can also be spelled as PRE or PFX

SUFFIX can also be spelled as SUF or SFX

WORDS can also be spelled as WORD

CHARS can also be spelled as CHAR

Description


The SPLIT edit primary command is used to selectively split apart lines of text based on a search string.  After a split occurs, a line on which the from-string is found will become two lines.  Everything before the "split point" will be on the first line, while everything after the split point will be on the second line.


Notes:  


The to-string must be a P-type Picture string or F-type Format string, which must contain one and only one vertical bar | character, and which may optionally contain other characters.  The vertical bar is not a literal | character nor any other data value, but represents a line-break to be inserted into your data.  Any characters in the to-picture that appear before the vertical bar will appear at the end of the first line, and any characters in the to-picture that appear after the vertical bar will appear at the beginning of the second line, after the split takes place.


Note:  While you might be tempted to think of a line-break being "inserted into your data" as being equivalent to inserting a raw EOL string (such as a Carriage Return / Line Feed pair) into your data line, that is not always the case.  You can also split lines on fixed-length files which are defined as EOL NONE.  The "insertion" of a line-break should be thought of in logical terms, not with reference to any line delimiter characters.  When your file gets saved to disk, SPFLite will handle the EOL delimiters as needed; you don't need to be concerned about this.


Note:  While the to-string can be a Format string, you will find that using a Picture string here should address most of your SPLIT requirements.  Where a Format string comes in handy is when the from-string is a Picture, and you need a code of = in the to-string that doesn't match the corresponding position in the from-string.  


Note:  Because the to-string must be a Picture or Format string, there may be cases where your "replacement data" might include characters that are already defined as special-purpose Picture or Format codes.  If you need such characters treated as ordinary data rather than as Picture or Format codes, you can escape those characters by preceding them with a \ backslash.  See Specifying a Picture or Format String for more information.


Because the vertical bar | character in the to-string represents a split point, but is not itself a literal character, when you specify the to-string literally as P'|' with no other characters, you are asking the SPLIT command to delete each instance of the from-string and replace it with a line break.  (Conceptually, this deletes data exactly the same as a CHANGE command with a to-string of ''.)  If you don't want your found-strings completely deleted, or if you want your found-strings replaced with something else, you have to specify that in the to-string Picture - either before the vertical bar, after the vertical bar - or both places, if you wish.


SPFLite treats the SPLIT command as a specialized type of CHANGE command.  Because of this, the RFIND, RLOCFIND and RCHANGE commands will also apply to SPLIT.  Traditionally, RFIND (or RLOCFIND) is mapped to F5, and RCHANGE to F6.  This means you can selectively split lines by alternating the use of the F5 and F6 keys (or, other keys if you map these commands differently).


The from-string may be any SPFLite string type (except for F-type Format strings), including Pictures and Regular Expressions.  If the from-string is a P-type Picture string, it may contain the alignment Picture codes [ and ] if desired.  Be mindful that alignment Picture codes do not represent data values, and so a search Picture cannot consist solely of alignment Picture codes.  (Unlike the JOIN command, which requires the from-string to contain an alignment code, the SPLIT command allows but does not require the use of alignment codes.)


The to-string Picture may contain any number of ! Picture/Format codes.  Each ! Picture code represents the entire text found by the from-string, even when that text can vary in size or content when it is described by a P-type Picture string or R-type Regular Expression.  This capability could be very useful in some SPLIT situations.  For example, suppose you had a "label" in your data consisting of ABC plus a digit.  You could change this into two lines, where the first line ends in the label and the second line begins with a duplicate copy of the label, by issuing a command of


       SPLIT P'ABC#' P'!|!' ALL.


See Working with SPLIT and JOIN Commands for example usage of the SPLIT command.


SPLIT and virtual highlighting pens


The SPLIT command does not use the various "virtual highlighting pen" color name keywords, like FIND and CHANGE do.  The colors that are present after SPLIT completes depends on how the original found string is colored.


If the found string is entirely of one color, any text inserted by SPLIT will have the same color that the found string had before being split.  If that text had been colored by using one of the "virtual highlighting pen" keyboard functions, that color will remain intact after the split takes place, or if the text had the default (normal) color beforehand, the new text will also have the default color.


If the found string is not entirely one color, but contains text in two or more colors (where the default text display is also considered a color), all of the text inserted by SPLIT will have the default (standard) color.  SPFLite does things this way because it would be too hard to reliably determine how and where the newly inserted text should be colored, if multiple colors had to be assigned to the inserted text.


Splitting lines with labels and tags


When you split a line containing a label, the label will remain with the first line produced from the splitting.  This is true even when a line is split in more than one place.  For example, suppose you had a line like this:


               .A 001 A-B-C


If you issue the command SPLIT '-' P'|' ALL .A, line 1 will keep the label  (the extra digits on a line with a tag or label don't really appear in SPFLite; we are just showing them here for explanation purposes):


               .A 001 A

               000002 B

               000003 C


When you split a line containing a tag, the tag is propagated on to every line produced from the splitting.  This is true even when a line is split in more than one place.  For example, suppose you had a line like this:


               :A 001 A-B-C


If you issue the command SPLIT '-' P'|' ALL :A, every line will have the tag:


               :A 001 A

               :A 002 B

               :A 003 C


Splitting strings vs. splitting lines


Just to be clear, SPLIT can only split a string in one place, but that doesn't mean it can't split a line in more than one place.  For example, suppose you had a line like this:


               000001 A-B-C-D


If you issued the command SPLIT '-' P'|' ALL .1, the data on line 1 would be split apart exactly as you'd hope, and you'd get the following result.  A message "Split performed 3 times" would appear, corresponding to the 3 places where the '-' dash appears on the line:


               000001 A

               000002 B

               000003 C

               000004 D


What you can't do is issue a command like SPLIT 'A-B-C-D' P'A|B|C|D' .1, because that is asking the single string 'A-B-C-D' to be split three times, and SPFLite doesn't support that.  But, what if that is what you wanted to do?  Keep reading ...


Performing splits more complex than SPLIT can support

You may encounter cases where the SPLIT command won't do everything you want.  You might want to split a particular string into multiple lines, whereas SPLIT will break a string in only one place.  You may have text already colored by highlighting pens and you need precise control over how the text colors are affected by splitting.  You may need special features supported by CHANGE, such as LEFT, RIGHT, TRUNC, MX, DX, etc.  These and other cases may require a different approach.


Keeping in might that a SPLIT is a type of "change" to a line, you can perform more-complex splitting by doing this in two stages.  First, use a CHANGE command to insert "user-defined split points" into your data, and then go back and use SPLIT to actually break them apart.  This technique has the nice feature that after the first part, you can go and manually inspect all of the user-defined split points you just put in, and verify they are all where you want them, possibly adding and removing a few before the second part, if you have certain special cases where some extra data must be split and other split points have to be taken out.  Because the CHANGE command, and any manual editing of your own, will have placed these user-defined split points exactly where you need them, only a simple form of SPLIT will be required to break the lines apart.


To do this, you might want to map some special ANSI character that you rarely use in your own data, and use that to represent your user-defined split points.  If necessary, you can use the (Ansi) function to get any ANSI character into the clipboard, and then use it as a value for KEYMAP.  For example, you could use the ? character for a user-defined split point,.


Suppose your task was to split some string "AB-CD-EF" into three lines on the dashes, between lines .ONE and .TWO.  You can't do that directly with SPLIT, but (assuming that ? does not appear in your data) you could do the following instead.


       CHANGE 'AB-CD-EF' 'AB?CD?EF' ALL .ONE .TWO

       SPLIT '?' P'|' ALL .ONE .TWO


So, can you see now how to do that three-way split we said (back in the last paragraph) you couldn't do?  Remember, we said that


       SPLIT 'A-B-C-D' P'A|B|C|D' .1


was illegal, and it is.  But you can do this:


       CHANGE 'A-B-C-D' 'A?B?C?D' .1

       SPLIT '?' P'|' ALL .1


Just that easy.


By the way, if you don't feel like using a ? character, pick anything you like that's convenient and not already in your data.


Splitting zero-length lines


SPLIT cannot be applied to zero-length lines, since there is literally nothing for SPLIT to find.  If you were inclined to do this, it implies that you wanted to add another zero-length line next to the original zero-length line.  That is, wherever there was a blank line, you'd now have two of them.  


If you really want to do this, the easiest way to do it is to find all the zero-length lines, APPEND as many user-defined split points as you need, and then split these characters as needed.  Here is an example of converting all zero-length lines into two zero-length lines each.  The commands will exclude all lines that are zero-length, then place a ? character on each excluded line (and also unexcludes them in the process) and then splits each marked line into two lines, in a way that also removes the ? character.


RESET

NX P'=' ALL

APPEND '?' ALL X

SPLIT '?' P'|' ALL


The Pad to Length command PL can take a / or \ modifier.  Putting PL/ on line 1 of a file will ensure that every line of the file is at least one character long.


Line splitting and line exclusion


The SPLIT command supports the X and NX keywords, to allow you to limit your line-range selection to only excluded (X), or only not-excluded lines (NX), if you wish.  Regardless of the use of X or NX keywords, when a line is split, it is considered a "change" to the line.  Any line that was excluded at the time is was split will become two lines, and both of these lines will be unexcluded.


The SPLIT command does not support the MX and DX keywords at this time.


Using ALL FIRST and ALL LAST on SPLIT


Because the processing performed by SPLIT is different than ever done in ISPF or any prior version of SPFLite, an unusual situation may occur that could affect how SPLIT works, perhaps in a way you would not want.


Suppose you had some lines like this (original data):


       000001 oneTWO oneTWOsix

       000002 oneTWO oneTWOsix


and you wanted to split the "oneTWO" strings into "one" and "TWO", but only the left-most ones.  That is, you are hoping you can change this data to look like this (desired result):


       000001 one

       000002 TWO oneTWOsix

       000003 one

       000004 TWO oneTWOsix


Remembering that SPLIT allows a LEFT or RIGHT operand, you might be inclined to write your SPLIT command like this:


       SPLIT 'oneTWO' P'one|TWO' LEFT ALL


This splits the left-most occurrence of 'oneTWO' on each line.  That should work, right?  Unfortunately, no.


The reason it won't work is that the SPLIT command will (a) effectively split the data the way you see it above as the 'desired result', but then (b) it will keep on splitting the data again, on the lines it just split.


For example, notice above that line 2 contains 'TWO oneTWOsix' after the first split.  The SPLIT command, after splitting line 1, moves on to "line 2".  


However, that line 2 is not the original line 2, but the newly created line 2.  On that line 2, there is an instance of the string 'oneTWO' which is the beginning of the string 'oneTWOsix' - and so that instance is also the "left-most" of its kind on that line.  This means that it matches the SPLIT command's search string of 'oneTWO', and so it also gets split.  The net result is that the file will actually end up looking like this:


       000001 one

       000002 TWO one

       000003 TWOsix

       000004 one

       000005 TWO one

       000006 TWOsix


So even though you asked for only the left-most string of 'oneTWO' to be split on any given line, you got them all split.  Well, what can be done about this?


Recall we starting this discussion by saying that the processing performed by SPLIT is different than ever done in ISPF or any prior version of SPFLite?  This is the first time for any SPF-style editor that a primary command, under the control of find and change strings, could split apart lines while in the process of scanning them.  This issue has to do with the way the keyword ALL works.


If you think about a standard CHANGE command, you could change the NEXT string, the PREV string, or ALL strings.  Now, when you say CHANGE ALL, do you really care in what order they are ALL changed, as long as it gets done?  Normally, no.


However, a SPLIT command could find a string on a line, and split that line, and then "run into" the remainder of the line it just split, possibly finding more of the same specified search string.  That's what happened in our example.  This will normally only be an issue when you are splitting a string which may have more than one occurrence on a given line.  


In particular, this problem will normally only happen when using the LEFT operand on ALL lines.  If you did a split of the RIGHT string on ALL lines, you would not "run into" a partial line that had more instances of the string, and so the problem we are discussing wouldn't happen.  That's because SPLIT does its ALL processing going forward from the beginning of the line range you are working on to the last line of it.  It's also because, once a line is split, and the right-hand side of the split point becomes a new line, the determination of what constitutes the "left side" or what is the "left most" occurrence of a string, starts over from the beginning of that new line.


Because SPLIT with the ALL operand performs its processing in a forward direction, the possibility of "running into" a "partial line" and re-processing it may exist.


How does ALL FIRST and ALL LAST solve this?  Think of these keywords like this:




The behavior of ordinary ISPF and SPFLite commands that take the ALL command has always been as if ALL FIRST had been specified.  Except, up until now, you couldn't do that - ALL FIRST was illegal syntax.  Now, you can.


When you have repeating data on several lines, and you only want to change the LEFT or RIGHT instance of them, you need to make sure you're using the right "form" of the SPLIT command to make sure you are getting the results you wanted.  This means, in most cases, you will want to issue your SPLIT commands that have LEFT or RIGHT like this:

       SPLIT 'oldnew' P'old|new' RIGHT ALL


       SPLIT 'oldnew' P'old|new' LEFT ALL LAST


Keep in mind that the first of these commands is really a shorthand for

       SPLIT 'oldnew' P'old|new' RIGHT ALL FIRST


but you don't need to be that explicit, because the FIRST is understood.


Remember that, as always, the various reserved keywords can be specified in any order you wish.


Using ALL FIRST and ALL LAST elsewhere


Because much of SPFLite uses common logic to handle common features across many primary commands, you will find that the keywords ALL FIRST and ALL LAST will be accepted on other commands, such as FIND and CHANGE.  


However, because the special circumstances present for SPLIT don't occur for FIND and CHANGE, you will find that the ALL FIRST and ALL LAST work the same way that an ordinary, garden-variety ALL keyword does.


Why didn't we just skip over the rest of the line after splitting, to avoid this problem?


That's a good question.  The answer is, if the SPLIT engine did such a thing, you would only be able to make one physical split per line.  That is too big a restriction to impose on you, especially when it's very likely you would want to do that very thing, and probably quite often.


For certain, SPLIT is powerful, and will take some study and experimentation to get the hang of it, but if you need to break apart large numbers of lines based on find/change criteria, there's nothing else like it.


A note about the IBM ISPF legacy SPLIT command


SPFLite does not support the IBM ISPF feature of "3270 split screen mode" and the associated legacy SPLIT command.  With tabbed editing and the ability to open multiple instances of SPFLite, as well as the Multi-Edit feature to edit several files simultaneously in the same edit window, IBM's 3270 split screen mode is not really needed.  


The SPFLite primary command SPLIT described here is unrelated to the IBM ISPF legacy SPLIT command, and merely reuses the SPLIT command name for a different purpose.  


Created with the Personal Edition of HelpNDoc: Free EPub and documentation generator