Fixed Width (Table Extract Method)

From Grooper Wiki
(Redirected from Fixed Width)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

The Fixed Width Table Extract Method is designed for documents where tabular data is presented in rows of text with columns defined by fixed character widths, rather than by delimiters or visual separators. This method is ideal for legacy reports, mainframe printouts, or any scenario where each field in a row occupies a specific number of characters.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2025). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

Introduction

The Fixed Width Table Extract Method reads tabular data from plain text where each column occupies a fixed number of characters. Instead of detecting visual columns or parsing a delimiter (like commas), Fixed Width splits each matched row by character positions you define.

How it differs from other methods:

  • Tabular Layout and Grid Layout detect columns and rows using headers and page layout.
  • Fluid Layout switches between Tabular Layout and Row Match based on labels; Delimited Extract is purpose-built for delimited files.
  • Row Match parses rows using a pattern and named groups (no fixed column widths required).
  • Delimited Extract parses values using a delimiter (CSV/TSV/etc.).
  • AI Table Reader uses a large language model to interpret semi-structured tables; Delimited Extract is rules-based, deterministic parsing of delimited inputs.
  • Fixed Width assumes monospaced, column-aligned text and splits by exact character counts.

What it is for

  • Ideal use cases:
    • Mainframe or legacy printouts where each field is column-aligned.
    • Exported reports rendered as fixed-pitch text.
    • Lists where every row has a consistent, known character layout.
  • Benefits:
    • Simple and deterministic: values are cut at exact character boundaries.
    • No dependency on headers, label sets, or page geometry.
    • Consistent results across documents with the same record layout.
  • Drawbacks:
    • Requires consistent row length and column widths across all rows.
    • Not suitable for proportional fonts or variable-length columns.
    • Column order and widths are not auto-discovered; they must be configured.

How to add and configure the Fixed Width Table Extract Method

Following are instructions for general setup of the Fixed Width Table Extract Method.

FYI

Please see the demo below for an example setup with screenshots and highlighted instructions.

  1. Create or open a Data Table that represents your repeating records.
  2. Add child Data Columns for each field you want to capture (for example: ID, Name, Date, Amount).
  3. On the Data Table, set the "Extract Method" to Fixed Width.
  4. Configure Fixed Width:
    • Set "Row Extractor" to identify each logical row (for example, a pattern that matches one line per record).
    • Set "Record Layout" to define each column's width in characters.
      • Easily done by placing your cursor in the Document Viewer and counting the characters and spaces as you press the arrow keys.
    • Optionally set "Trim" to remove leading/trailing whitespace from extracted values.
  5. Make sure each Data Column's name matches a corresponding entry in "Record Layout".
  6. Test extraction:
    • Run a Batch through an Extract step or use the Tester tab of the Data Table or Data Model.
    • Review the resulting table values. Adjust widths or the row extractor as needed.
  7. Troubleshooting tips:
    • If some cells are empty or shifted, verify that the "Record Layout" totals to the actual row length matched by "Row Extractor".
    • Ensure every Data Column you want filled has a matching entry name and non-zero width.
    • If rows are missing, broaden or correct the "Row Extractor" so it matches each record line.

Example: Configuring and testing Fixed With

Fixed width text files are a specifically formatted TXT file containing tabular data. In these types of files, each column is a fixed character length. Rather than any separator character defining where columns start and stop, you simply know data is in a column because of the character count on a line. Column A is 10 characters, then Column B is 2, Column C is 5 and so on.

  1. Expand the Node Tree and select the "Fixed Width - Data Table" Data Table from the provided Project. Notice the Extract Method property is set to "Fixed Width".
  2. Click the drop-down arrow to the left of the Extract Method property set to "Fixed Width" to expand its sub-properties. Notice the Row Extractor property is set to "Pattern Match".
  3. Click the ellipsis button to the right of the Row Extractor property set to "Pattern Match" to open the "Pattern Match" editor.
  4. Click the "Select Batch" button in the Batch Viewer, then be sure to select the "Fixed-Width" Batch.
  5. Select the first Batch Folder in the Batch Viewer. Notice the simple regular expression of the Value Pattern field.
    • One to many not carriage returns or new line feeds are collected non-greedily. This would get every line, including the header row, but putting the phone number pattern after makes it just collect the "table" rows.
  6. You can see the pattern successfully returns each line of the "table" of the CSV file, producing four results.
  7. Select the second Batch Folder in the Batch Viewer. The regular expression pattern also works with the results being on a single line.
  8. Right-click in the Document Viewer to see that text wrap is enabled.
    • While this isn't necessary, it just illustrates, if you have Text Wrap enabled, that the contents are in fact only on a single line.
  9. Click the drop-down arrow to the left of the Record Layout property with four entries to expand its sub-properties. Click the ellipsis button to the right of the Local Entries property with four entries to open the "Local Entries" editor. It's worth noting that you could reference a Lexicon here instead of using Local Entries.
  10. In the "Local Entries" window, the entries are key-value pairs, with the key representing the name of the Data Column, and the value representing the length of the string within the CSV file. Note that the key name must match exactly the target Data Columns.
  11. Click the Tester tab.
  12. Select the first Batch Folder in the Batch Viewer. Place your cursor to the left of the first row of the "table" in the Document viewer, then click the right arrow key and count the number of times clicked until you reach the State value. You should count twelve. Repeat the counting for each value in the "row". State should be two, date of birth should be fourteen, and phone number should also be twelve.
  13. Click the "Test Extraction" button.
  14. Notice all results are accurately returned.
  15. Because the date of birth field of the fourth row was expecting a date pattern, you can clear it to remove the error state.
  16. Select the second Batch Folder, then click the "Test Extraction" button.
  17. Notice all results are accurately returned.

Properties overview

The Fixed Width extract method exposes the following properties.

  • Row Extractor
    • Definition: The extractor used to match table rows in the fixed-width document.
    • Remarks: For each match produced, a table row is created and then split according to the "Record Layout". Use a pattern (for example, a regular expression) that identifies a full logical row. If the row does not meet the expected length, parsing may fail or produce incomplete data.
    • Purpose / use case:
      • Identify exactly one row per record (e.g., one line of fixed-width text).
      • Ensure the matched text length accommodates the total of all configured column widths.
  • Record Layout
    • Definition: A key-value list defining the character width for each column.
    • Remarks: Implemented as a lexicon where each entry uses the format ColumnName=Width. The total of all widths should match the full row length returned by "Row Extractor". You can define entries directly in the property's editor.
    • Purpose / use case:
      • Declaratively map column names to their fixed character widths.
      • Control how the row content is sliced into cells.
  • Example:
Record Type=4
Sequence=8
Run Date=10
ID=8
Amount=14
  • Notes:
    • Each entry's name must match a Data Column name to populate that column.
  • If a column is not listed, that Data Column will be created empty in each row.
  • Trim
    • Definition: Specifies whether whitespace trimming is applied to extracted column values.
    • Remarks: When enabled (default), leading and trailing whitespace is removed after slicing the value. Disable if whitespace is meaningful for your data.
    • Purpose / use case:
      • Clean up padded values that are common in fixed-width formats.
      • Preserve spacing when it carries meaning (for example, alignment flags).

Configuration examples

Simple text rows:

1001  John Smith   2023-01-01  $123.45
1002  Jane Doe     2023-01-02  $234.56

Record Layout:

ID=5
Name=13
Date=12
Amount=8

Recommended "Row Extractor":

  • A regex that matches the entire line length:
[^\r\n]+[.][0-9]{2}

Best practices

  • Ensure the "Record Layout" widths exactly match what the "Row Extractor" captures.
  • Keep column names in "Record Layout" identical to Data Column names.
  • Start with slightly wider widths for text fields (like Name) if spacing is variable, then refine as needed.
  • Leave "Trim" enabled unless spacing is significant for downstream rules.
  • Use Data Table options like "Initial Row Count", "Row Count Range", and "Generate Footer Row" as needed for UI and validation behavior.