Text File Input

    The features of the transform allow you to read from a list of files or directories, use wild cards in the form of regular expressions, and accept generalized filenames from previous transforms.

    File Tab

    The table below provides a detailed descriptions of the features available on the File tab:

    OptionDescription

    File or directory

    This field specifies the location and/or name of the input text file.

    Regular expression

    Specify the regular expression you want to use to select the files in the directory specified in the previous option. For example, you want to process all files that have a .txt extension. (See below “Selecting file using Regular Expressions”)

    Selected Files

    This table contains a list of selected files (or wildcard selections) along with a property specifying if file is required or not. If a file is required and it isn’t found, an error is generated. Otherwise, the filename is skipped.

    Show filenames(s)…​

    Displays a list of all files that will be loaded based on the current selected file definitions.

    Show file content

    Displays the content of the selected file.

    Show content from first data line

    Displays the content from the first data line only for the selected file.

    Selecting files using Regular Expressions

    The Text File Input transform can search for files by wildcard in the form of a regular expression. Regular expressions are more sophisticated than using ‘*‘ and ‘?’ wildcards. Below are a few examples of regular expressions:

    FilenameRegular ExpressionFiles selected

    /dirA/

    .userdata..txt

    Find all files in /dirA/ with names containing userdata and ending with .txt

    /dirB/

    AAA.

    Find all files in /dirB/ with names that start with AAA

    /dirC/

    [ENG:A-Z][ENG:0-9].

    Find all files in /dirC/ with names that start with a capital and followed by a digit (A0-Z9)

    Accepting filenames from a previous transform

    This option allows even more flexibility in combination with other transforms such as “Get Filenames”. You can construct your filename and pass it to this transform. This way the filename can come from any source: text file, database table, etc.

    OptionDescription

    Accept filenames from previous transforms

    Enables the option to get filenames from previous transforms.

    Pass through fields from previous transform

    Enable this option to add all previous fields coming into the transform to the transform output. This behaves like a join option.

    Transform to read filenames from

    Transform from which to read the filenames

    Field in the input to use as filename

    Text File Input looks in this transform to determine which filenames to use

    The content tab allows you to specify the format of the text files that are being read. Below is a list of the options associated with this tab:

    Error Handling Tab

    The error handling tab allows you to specify how the transform reacts when errors occur. The table below describes the options available for Error handling:

    OptionDescription

    Ignore errors?

    Enable if you want to ignore errors during parsing

    Skip error lines

    Enable if you want to skip those lines that contain errors. You can generate an extra file that contains the line numbers on which the errors occurred. Lines with errors are not skipped, the fields that have parsing errors, will be empty (null)

    Error count field name

    Add a field to the output stream rows; this field contains the number of errors on the line

    Error fields field name

    Add a field to the output stream rows; this field contains the field names on which an error occurred

    Add a field to the output stream rows; this field contains the descriptions of the parsing errors that have occurred

    Warnings file directory

    When warnings are generated, they are placed in this directory. The name of that file is <warning dir>/filename.<date_time>.<warning extension>

    Error files directory

    When errors occur related to non-existing or non-accessible files, they are placed in this directory. The name of the file is <errorfile_dir>/filename.<date_time>.<errorfile_extension>

    Failing line numbers files directory

    When a parsing error occurs on a line, the line number is placed in this directory. The name of that file is <errorline dir>/filename.<date_time>.<errorline extension>

    The filters tab provides you with the ability to specify the lines you want to skip in the text file. The table below describes the available options for defining filters:

    OptionDescription

    Filter string

    The string for which to search

    Filter position

    The position where the filter string has to be at in the line. Zero (0) is the first position in the line. If you specify a value below zero (0) here, the filter string is searched for in the entire string.

    Stop on filter

    Specify Y here if you want to stop processing the current text file when the filter string is encountered.

    Positive match

    Specify Y here if you want to process lines that match the filter, or N if you want to ignore such lines.

    Fields Tab

    The fields tab allows you to specify the information about the name and format of the fields being read from the text file. Available options include:

    OptionDescription

    Name

    Name of the field

    Type

    Type of the field can be either String, Date or Number

    Format

    See Number Formats for a complete description of format symbols.

    Position

    This is needed when processing the ‘Fixed’ filetype. It is zero based, so the first character is starting with position 0.

    Length

    For Number: Total number of significant figures in a number; For String: total length of string; For Date: length of printed output of the string (e.g. 4 only gives back the year).

    Precision

    For Number: Number of floating point digits; For String, Date, Boolean: unused;

    Currency

    Used to interpret numbers like $10,000.00 or E5.000,00

    Decimal

    A decimal point can be a “.” (10;000.00) or “,” (5.000,00)

    Grouping

    A grouping can be a dot “,” (10;000.00) or “.” (5.000,00)

    Null if

    Treat this value as NULL

    Default

    Default value in case the field in the text file was not specified (empty)

    Trim

    type trim this field (left, right, both) before processing

    Repeat

    If the corresponding value in this row is empty, repeat the one from the last row when it was not empty.

    Number Formats

    The information below on Number formats was taken from the Sun Java API documentation, located at . For further information on valid numeric formats used in this transform, view the Number Formatting Table.

    In a pattern, the exponent character immediately followed by one or more digit characters indicates scientific notation (for example, “0.###E0” formats the number 1234 as “1.234E3”.

    Date formats

    The information on Date formats was taken from the Sun Java API documentation, located at:

    http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html. For further information on valid date formats used in this transform, view the Date Formatting Table.

    LetterDate or Time ComponentPresentationExamples

    M

    Month in year

    Month

    July; Jul; 07

    w

    Week in year

    Number

    27

    W

    Week in month

    Number

    2

    D

    Day in year

    Number

    189

    d

    Day in month

    Number

    10

    F

    Day of week in month

    Number

    2

    E

    Day in week

    Text

    Tuesday; Tue

    a

    Am/pm marker

    Text

    PM

    H

    Hour in day (0-23)

    Number 0

    k

    Hour in day (1-24)

    Number 24

    K

    Hour in am/pm (0-11)

    Number 0

    h

    Hour in am/pm (1-12)

    Number 12

    m

    Minute in hour

    Number 30

    s

    Second in minute

    Number 55

    S

    Millisecond

    Number 978

    z

    Time zone

    General time zone

    Pacific Standard Time; PST; GMT-08:00

    Z

    Time zone

    RFC 822 time zone

    -0800

    OptionDescription

    Short filename field

    The field name that contains the filename without path information but with an extension.

    Extension field

    The field name that contains the extension of the filename.

    Path field

    The field name that contains the path in operating system format.

    Size field

    The field name that contains the size of the field.

    Is hidden field

    The field name that contains if the file is hidden or not (boolean).

    Uri field

    The field name that contains the URI.

    Root uri field

    The field name that contains only the root part of the URI.

    Function/ButtonDescription

    Show filenames

    Displays a list of all the files selected. Note that if the pipeline is to be run on a separate server, the result might be incorrect.

    Show file content

    Displays the first lines of the text-file. Make sure that the file-format is correct. When in doubt, try both DOS and UNIX formats.

    Show content from first data line

    Helps you position the data lines in complex text files with multiple header lines and more.

    Get fields

    Allows you to guess the layout of the file. In case of a CSV file, this is performed almost automatically. When you select a file with fixed length fields, you must specify the field boundaries using a wizard.

    Preview rows

    Preview the rows generated by this transform.