Syntax
This section of the manual serves as a reference of the available metadata functions that are available in the ScannerVision Expression Editor. The examples given in this section are intended to show the behavior of the relevant function only. For more complete and real world example refer to the Examples section.
The ScannerVision Expression Parser is the engine that reads your expressions and execute the functions you specified. Before explaining the functions and how they work we will explain how the Expression Parser works internally as this will help you to unlock the full power that the functions provide.
Terminology
We will start by defining a few programming terms that we will use in the text that follows.
Functions and parameters
Functions are instructions that a computer executes. You are telling the computer to DO something. Among the available metadata functions are the "ucase" and "split" functions. With the "ucase" function you are telling the computer, "Make all characters in the text uppercase". With some functions the computer needs more information such as with the "split" function which needs to know WHAT to split on. We will be telling the "split" function what to split on with what is called a function parameter. Some functions may require more parameters than others but all ScannerVision metadata functions have an implied parameter namely the text on which to work. This is mostly the metadata tag in the context of which the function appears but it could also be the result of previous function. You won't have to specify this parameter explicitly.
Metadata tags in ScannerVision are delineated with square brackets. The [DATETIME] tag represents a date and time value such as "2013-04-02 08:03:07". To apply metadata functions to the [DATETIME] tag you would put the functions inside the tag, before the closing square bracket, surrounded with parentheses and with a space between the tag name and the opening parentheses of the first function.
In the following expression:
[DATETIME (split "-")(take 1)]
we are applying the "split" and "take" functions to the DATETIME tag. In the case of the "split" function we are passing it a parameter "-" which tells the function that we want the date & time to be split on the hyphen. This results in 3 parts namely "2014", "04" and "02 08:03:07" which become the input of the "take" function. In the "take" function we are passing the parameter 1 which tells the function that we want the first part of the split result i.e. "2014".
If a function requires more than one parameter such as the "replace" function they are separated by commas e.g.:
[DATETIME (replace "-", "/")]
Input & Result
Input is the data on which a function works and Result is the outcome of that operation. In the expression:
[DATETIME (split "-")(take 1)]
the value of the DATETIME tag - let us assume that to be "2013-03-12 14:23:54" - is the input of the "split" function and the outcome "2013", "03", "12 14:23:54" is its result. The result of the "split" function becomes the input of the "take" function which yields the result "2013".
Characters and Strings
We will explain characters and strings in the context of a text editor like Notepad. Anything you type into Notepad is just text whether it is letters of the alphabet, numbers or symbols. Every keystroke represents a character e.g. 'A', 'b', '1', '@'. These are all characters and each has a unique number called and ordinal value which we encountered in the discussion of metadata tags and the UNICODE character map. Not all characters are visible such as the Space or Tab characters but they all have an ordinal value.
A sequence of characters is called a string. A string can contain zero or more characters. When a string contains zero characters it is called and empty string. The following are all strings: "Customer", "INVOICE00012345", "$3000", "25º", "A", "".
To distinguish between the character A and the string A in the discussion below we use single quotes to indicate the character 'A' and double quotes to indicate the string "A". So,
'A' = Character A
"A" = String A
'AB' is not valid because there is no character AB.
Integers
Integers are whole numbers which are numbers without a decimal value e.g. 1, 300, -15.
Arrays
An array is a series of values and you can visualize it as a table with many rows and only one column. The numbers of the rows are called indexes and the values are called elements.
| Index | Elements |
|---|---|
| 1 | CN000123 |
| 2 | ON023456 |
The table above represents an array with 2 elements. To refer to the elements we use the notation [1], [2] etc. So, [1] = "CN000123" and [2] = "ON023456". Don't confuse the square brackets "[" and "]" with ScannerVision tags. If we want to refer to the whole array, we use the notation: ["CN000123", "ON023456"]. Here the double quotes indicate the elements in the array are strings. You could also have ['a', 'b', 'c'] which would be an array of characters.
Arrays in ScannerVision metadata functions will always contain strings or characters. When we want to refer to an array of strings we use the notation string[] and for a character array we use character[].
Any string can be thought of as an array of characters. So the string "Apple" is equivalent to: ['A', 'p', 'p', 'l', 'e']
This is why you are able to use the "take" function on a string. Let us use the DATETIME tag with the value "2013-03-12 14:23:54" as an example. You could define an expression as follows:
[DATETIME (take 1-4)]
The result of the expression is the array ['2', '0', '1', '3']. If you pasted the expression above in the ScannerVision Expression Editor you won't see the array ['2', '0', '1', '3'] but "2013". The result of the "take" function is an array and whenever the last function in an expression produces an array ScannerVision automatically converts it to a string by concatenating all the elements in the array - even when the elements are strings themselves. We discuss this in more detail below. The concatenation of array elements to produce a string is what the "join" function does so we could have written the expression above as follows:
[DATETIME (take 1-4)(join)]
If we call a function such as "(split "-")" we are instructing ScannerVision to look for all instances of the "-" string and to split the string there. The result is an array of strings. Using the DATETIME example above again, if we had the following expression:
[DATETIME (split "-")]
the result is an array of strings as follows:
[1] = "2013"
[2] = "03"
[3] = "12 14:23:54"
So, given the expression:
[DATETIME (split "-")(take 1)]
the string array result of the "split" function becomes the input of the "take" function. We told the "take" function to take array element 1 which is "2013". Not surprisingly, if we had said (take 2) the result would have been "03".
When a string is passed into a function that expects a string[] as input, the function converts the string into an array of strings e.g. "apple" becomes ["a", "p", "p", "l", "e"]. Similarly, when a string[] is passed to a function that expects a character[], the elements in the array are concatenated and then converted into a character[] e.g. ["brown", "dog"] becomes ['b', 'r', 'o', 'w', 'n', 'd', 'o', 'g'].
To see this for yourself, enter the following expression in the ScannerVision Expression Editor:
["ABCD" (join "*")]
The result of this is: "A*B*C*D". The join function expects an array so the string "ABCD" is converted to ["A", "B", "C", "D"] before the join "*" is performed.
Types
Type is a collective noun for character, string, integer and array. Here are the types we have encountered:
| Example | Identifier | Description |
|---|---|---|
| 'A' | character | A single character |
| "Apple" | string | Zero or more strings |
| 1 | integer | A number without decimals |
| ['A', 'b', '$' ] | character[] | Array of character |
| ["Apple", "Pear"] | string[] | Array of string |
Application of the terms
With the information given above you will understand the following statement:
The split function takes a string as input, a string as a parameter and returns a string[].
Implicit “join”
When a metadata function returns an array of string and it is the last function of the tag, ScannerVision does an implicit “join” of the array elements to form a string.
Example:
Let’s say the BC4 tag contains the value “2013-03-12” and you perform a split on the “-“ character like this:
[BC4 (split "-")]
The result you’ll see in the ScannerVision Expression Editor is “20130312” and not [“2013”, “04”, “12”]. Behind the scenes ScannerVision actually did this:
[BC4 (split "-")(join)]
Metadata concatenation
Anywhere that a metadata tag can be used in a function, multiple tags can be used e.g. [TAG (match [”^”][BC])]. You could for example build up a regular expression using a combination of strings and metadata tags. In the previous example, let’s say BC contains the value “INV001” the regular expression will expand to [TAG (match “^INV001”)].
Regular Expressions
The “split” and “match” functions take a string parameter which represents the pattern on which to do the split or match. This pattern could be any valid regular expression. An explanation of regular expressions is beyond the scope of this manual. A good understanding of regular expressions is however highly recommended if you want to make full use of the power of ScannerVision metadata functions. We can recommend the Regular-Expressions.info website if you want to brush up on you Regex skills.
White Space
White space characters include the following:
- SPACE (U+0020)
- OGHAM SPACE MARK (U+1680)
- MONGOLIAN VOWEL SEPARATOR (U+180E)
- EN QUAD (U+2000)
- EM QUAD (U+2001)
- EN SPACE (U+2002)
- EM SPACE (U+2003)
- THREE-PER-EM SPACE (U+2004)
- FOUR-PER-EM SPACE (U+2005)
- SIX-PER-EM SPACE (U+2006)
- FIGURE SPACE (U+2007)
- PUNCTUATION SPACE (U+2008)
- THIN SPACE (U+2009)
- HAIR SPACE (U+200A)
- NARROW NO-BREAK SPACE (U+202F)
- MEDIUM MATHEMATICAL SPACE (U+205F)
- IDEOGRAPHIC SPACE (U+3000)
- LINE SEPARATOR character (U+2028)
- PARAGRAPH SEPARATOR character (U+2029)
- CHARACTER TABULATION (U+0009)
- LINE FEED (U+000A)
- LINE TABULATION (U+000B)
- FORM FEED (U+000C)
- CARRIAGE RETURN (U+000D)
- NEXT LINE (U+0085)
- NO-BREAK SPACE (U+00A0)