2023:Microsoft Office Integration (Concept): Difference between revisions

From Grooper Wiki
Created page with "right|thumb|Native text for Microsoft Office applications is a powerful data integration tool in Grooper. <blockquote style="font-size:14pt">Easier access to the contents of files from the worlds most used business application suite.</blockquote> ==About== '''[https://en.wikipedia.org/wiki/Microsoft_Office Microsoft Office]''' integration allows a '''Grooper''' user to leverage the native text of files generated in the Micr..."
 
No edit summary
Line 2: Line 2:


<blockquote style="font-size:14pt">Easier access to the contents of files from the worlds most used business application suite.</blockquote>
<blockquote style="font-size:14pt">Easier access to the contents of files from the worlds most used business application suite.</blockquote>
{| class="wikitable" style="margin:left"
! Previous Versions
|-
|
[[Microsoft Office Integration - 2.90|Grooper 2.90]]
<br>
|}


==About==
==About==
Line 53: Line 61:
# Right click the document whose native text you wish to view.
# Right click the document whose native text you wish to view.
#* Notice imported Word documents will have the Word icon next to the native file's file name on the '''Batch Folder'''.  This lets you know the document folder's native file (the one imported into '''Grooper''') is a Word file.
#* Notice imported Word documents will have the Word icon next to the native file's file name on the '''Batch Folder'''.  This lets you know the document folder's native file (the one imported into '''Grooper''') is a Word file.
# Select "Activities".
# Select "Word Document".
# Select "Execute".
# Select "Word to PDF".
|
|
[[File:Ms-office-integration-08.png]]
[[File:2023 Microsoft Office Integration - 2023 01 How To 01 Word Documents 01.png]]
|-
|-
|valign=top|
|valign=top|
The '''Execute''' activity applies simple processing commands to a specified object typeTo turn the imported Word file into a PDF file for further '''Grooper''' processing, you will indicate you want to process the ''Word Document'' object type and execute the ''Word to PDF'' command.
This will create a PDF copy of the Word document, stored on the document folderThis document is now viewable in '''Grooper's''' Document Viewer and contains all the native text data from the Word file.


# From this window, you will configure the '''Execute''' activity.
# To view the PDF, click on the icon in the top right corner of the Batch Viewer.  
# Select the '''''Commands''''' property and press the ellipsis button at the end to add a new command.
# From the drop-down, select "PDF".
 
This document folder can now be processed by the '''Recognize''' activity to extract that native text for further document processing (classification, data extraction etc).
|
|
[[File:Ms-office-integration-09.png]]
[[File:2023 Microsoft Office Integration - 2023 01 How To 01 Word Documents 02.png]]
|-
|valign=top|
# The following window allows you to collect various commands for the '''Execute''' activity.
# Click the "Add" button to add a new command.
# Under '''''Object Type''''', choose ''Word Document''.
# Under '''''Command''''', choose ''Word to PDF''.
# Press "OK" to finish configuring the command.
|
[[File:Ms-office-integration-10.png]]
|-
|valign=top|
# Press the "Execute" button to execute the command, converting the Word file to a PDF file.
|
[[File:Ms-office-integration-11.png]]
|-
|valign=top|
This will create a PDF copy of the Word document, stored on the document folder.  This document is now viewable in '''Grooper's''' Document Viewer and contains all the native text data from the Word file.  This document folder can now be processed by the '''Recognize''' activity to extract that native text for further document processing (classification, data extraction etc).
|
[[File:Ms-office-integration-12.png]]
|}
|}
</tab>
</tab>
Line 94: Line 85:
# Right click the document whose native text you wish to view.
# Right click the document whose native text you wish to view.
#* Notice imported Excel documents will have the Excel icon next to the native file's file name on the '''Batch Folder'''.  This lets you know the document folder's native file (the one imported into '''Grooper''') is an Excel file.
#* Notice imported Excel documents will have the Excel icon next to the native file's file name on the '''Batch Folder'''.  This lets you know the document folder's native file (the one imported into '''Grooper''') is an Excel file.
# Select "Activities".
# Select "Excel Document".
# Select "Execute".
# Click "Convert to CSV...".
|
[[File:Ms-office-integration-01.png]]
|-
|valign=top|
The '''Execute''' activity applies simple processing commands to a specified object type.  To turn the imported Excel file into a CSV file for further '''Grooper''' processing, you will indicate you want to process the ''Excel Document'' object type and execute the ''Excel to CSV'' command.
 
# From this window, you will configure the '''Execute''' activity.
# Select the '''''Commands''''' property and press the ellipsis button at the end to add a new command.
|
[[File:Ms-office-integration-02.png]]
|-
|valign=top|
# The following window allows you to collect various commands for the '''Execute''' activity.
# Click the "Add" button to add a new command.
# Under '''''Object Type''''', choose ''Excel Document''.
# Under '''''Command''''', choose ''Excel to CSV''.
|
|
[[File:Ms-office-integration-03.png]]
[[File:2023 Microsoft Office Integration - 2023 01 How To 02 Excel Spreadsheets 01.png]]
|-
|-
|valign=top|
|valign=top|
# The '''''Command Settings''''' property gives you some additional options for how the CSV file is converted from the native Excel file.
# When the "Convert to CSV" window pops up, select the '''''Save As''''' property using the dropdown menu and select one of the following options:  
#* Press the ellipsis button at the end to select a conversion method.
#* ''Children''
# In the following window, select the '''''Conversion Method''''' property, using the dropdown menu to select one of the following three options:
#* ''Files''
#* ''Burst''
#* ''Attachment''
#* ''SaveNew''
#* ''Convert''
|
|
[[File:Ms-office-integration-04.png]]
[[File:2023 Microsoft Office Integration - 2023 01 How To 02 Excel Spreadsheets 02.png]]
|-
|-
|valign=top|
|valign=top|
=== Burst Conversion ===  
=== Children ===
The ''Burst'' option will convert the Excel file to a CSV file and saves the results as child object(s).
The ''Children'' option will convert the Excel file to a CSV file and saves the results as child object(s).


This is the default, and most typical, configuration option.  As seen in this image, if there are multiple sheets, they will be saved as multiple child objects of the document folder.  The native Excel file had two sheets.  So we get two child CSV files.
This is the most typical, configuration option.  As seen in this image, if there are multiple sheets, they will be saved as multiple child objects of the document folder.  The native Excel file had two sheets.  So we get two child CSV files.
|
|
[[File:Ms-office-integration-05.png]]
[[File:2023 Microsoft Office Integration - 2023 01 How To 02 Excel Spreadsheets 03.png]]
|-
|-
|valign=top|
|valign=top|
=== SaveNew Conversion ===
=== Files ===
The ''SaveNew'' option will convert the Excel file to a CSV file and save the result as a new file.  The new file is stored on the '''Batch Folder''' with the native file (More specifically, it is stored in the file store location associated with the '''Batch Folder'''.)
The ''Files'' option will convert the Excel file to a CSV file and save the result as a new file.  The new file is stored on the '''Batch Folder''' with the native file (More specifically, it is stored in the file store location associated with the '''Batch Folder'''.)
 
When choosing this option, you will name the generated file using the '''''File Name''''' property.
 
As you can see in this image, the document folder has not one, but two files associated with it, as seen in the "Advanced > Files" tab.


# The file named "MOCK_EXCEL_FILE.xlsx" is the original imported file.
# To see the files, click on the document in the '''Batch''' in the node tree.
# The file named "Converted_to_CSV.csv" is the new, converted file.
# Click on the "Advanced" tab.
#* And, it's named thusly because we indicated the files should be named "Converted_to_CSV", using the '''''File Name''''' property.
# Here we can see we have two .csv files in addition to the .xlsx file.  
|
|
[[File:Ms-office-integration-06.png]]
[[File:2023 Microsoft Office Integration - 2023 01 How To 02 Excel Spreadsheets 04.png]]
|-
|-
|valign=top|
|valign=top|
=== Convert Conversion ===
=== Attachment ===


The ''Convert'' option will convert the Excel file to a CSV file and ''replace'' the native file.   
The ''Attachment'' option will convert the Excel file to a CSV file and ''replace'' the native file.   


This is a "true" conversion.  Rather than making a CSV copy of the Excel file in one way or another, the original Excel file is transformed into a CSV file.
This is a "true" conversion.  Rather than making a CSV copy of the Excel file in one way or another, the original Excel file is transformed into a CSV file.


Seen in this image, the original native file has been converted from an XLSX file to a CSV file.
Seen in this image, the original native file has been converted from an XLSX file to a CSV file.
* The file's name has changed from "MOCK_EXCEL_FILE.xlsx" to "MOCK_EXCEL_FILE.csv"
* The file's name has changed from "MOCK_DATA_EXCEL_MULTISHEET.xlsx" to "MOCK_DATA_EXCEL_MULTISHEET.csv"


{|cellpadding="10" cellspacing="5"
{|cellpadding="10" cellspacing="5"
Line 162: Line 131:
|}
|}
|
|
[[File:Ms-office-integration-07.png]]
[[File:2023 Microsoft Office Integration - 2023 01 How To 02 Excel Spreadsheets 05.png]]
|}
|}
</tab>
</tab>
Line 175: Line 144:
To add the '''Execute''' activity to a '''Batch Process''':
To add the '''Execute''' activity to a '''Batch Process''':


# Create or select a working '''Batch Process'''.
# An Execute '''Batch Process Step''' needs to be added to a '''Batch Process'''.
#* '''Batch Processes''' are stored and added to the <code>Batch Processing > Processes > Working</code> folder
# Press the "Add Step" button.
# Change the added step's '''''Activity Type''''' to '''Execute'''.
# Configure the '''Execute''' step as described above to convert native Word or Excel files for subsequent processing.
# Configure the '''Execute''' step as described above to convert native Word or Excel files for subsequent processing.
#* Add a ''Word to PDF'' command for the ''Microsoft Word'' object type to convert Word files to PDFs.
#* Add a ''Word to PDF'' command for the ''Microsoft Word'' object type to convert Word files to PDFs.
#* Add an ''Excel to CSV'' command for the ''Microsoft Excel'' object type to convert Excel files to CSV.
#* Add an ''Excel to CSV'' command for the ''Microsoft Excel'' object type to convert Excel files to CSV.
|
|
[[File:Ms-office-integration-13.png]]
[[File:2023 Microsoft Office Integration - 2023 01 How To 03 Batch Processing Execution 01.png]]
|}
|}
==Version Differences==
Prior to '''Grooper 2.9''' files from the Microsoft Office Suite had to be rendered (essentially a "print..." function) to PDF in order to view contents and use activities more effectively.
[[Category:Articles]]
[[Category:Version 2.90]]

Revision as of 09:30, 1 December 2023

Native text for Microsoft Office applications is a powerful data integration tool in Grooper.

Easier access to the contents of files from the worlds most used business application suite.

Previous Versions

Grooper 2.90

About

Microsoft Office integration allows a Grooper user to leverage the native text of files generated in the Microsoft Office Suite such as Microsoft Word documents and Microsoft Excel spreadsheets. This feature can pull the native text from and perform type-specific activities on these files.

Supported File Types

  • Microsoft Word documents (.doc and.docx)
    • For Word documents, you can generate a Grooper-usable document with the Execute activity, using the Word to PDF command for the Word Document object type. The PDF will contain all the native text from the Word document, obtainable for further Grooper processing using the Recognize activity.
  • Microsoft Excel spreadsheets (xls and xlsx)
    • For Excel documents, you can generate a Grooper-usable document with the Execute activity, using the Excel to CSV command for the Excel Document object type. CSV files are natively readable by Grooper in version 2.90. The Recognize activity is not required.

How to Use

To make use of this feature, ensure that Microsoft Office is installed on the machine running Grooper Design Studio.

Furthermore, the bit version of Grooper and Microsoft Data Access Components (MDAC) must match.

  • If you are running the 64-bit version of Grooper, you must use the 64-bit MDAC components.
  • If you are using the 32-bit version of Grooper you must use the 32-bit MDAC components.
  • FYI

    If you have completed these prerequisites and both Office and the appropriate MDAC component's installed, you may see this error message if you select a Batch Folder in a Batch with an attached Excel file.

    If this is case, try closing Grooper Design Studio and re-open it as an administrator. This will create and/or provide access rights to the required directory indicated by the error message.

    Ad Hoc Execution: Testing in Grooper Design Studio

    Like any Activity, the Execute activity can be applied to a document in an "ad hoc" manner in Grooper Design Studio. This is typical for Grooper architects testing and designing solutions before building a Batch Process.

    Getting a Result with Microsoft Word Documents

    1. In Grooper Design Studio select a Batch that contains the desired documents.
    2. Right click the document whose native text you wish to view.
      • Notice imported Word documents will have the Word icon next to the native file's file name on the Batch Folder. This lets you know the document folder's native file (the one imported into Grooper) is a Word file.
    3. Select "Word Document".
    4. Select "Word to PDF".

    This will create a PDF copy of the Word document, stored on the document folder. This document is now viewable in Grooper's Document Viewer and contains all the native text data from the Word file.

    1. To view the PDF, click on the icon in the top right corner of the Batch Viewer.
    2. From the drop-down, select "PDF".

    This document folder can now be processed by the Recognize activity to extract that native text for further document processing (classification, data extraction etc).

    Excel Spreadsheets

    1. In Grooper Design Studio select a Batch that contains the desired documents.
    2. Right click the document whose native text you wish to view.
      • Notice imported Excel documents will have the Excel icon next to the native file's file name on the Batch Folder. This lets you know the document folder's native file (the one imported into Grooper) is an Excel file.
    3. Select "Excel Document".
    4. Click "Convert to CSV...".

    1. When the "Convert to CSV" window pops up, select the Save As property using the dropdown menu and select one of the following options:
      • Children
      • Files
      • Attachment

    Children

    The Children option will convert the Excel file to a CSV file and saves the results as child object(s).

    This is the most typical, configuration option. As seen in this image, if there are multiple sheets, they will be saved as multiple child objects of the document folder. The native Excel file had two sheets. So we get two child CSV files.

    Files

    The Files option will convert the Excel file to a CSV file and save the result as a new file. The new file is stored on the Batch Folder with the native file (More specifically, it is stored in the file store location associated with the Batch Folder.)

    1. To see the files, click on the document in the Batch in the node tree.
    2. Click on the "Advanced" tab.
    3. Here we can see we have two .csv files in addition to the .xlsx file.

    Attachment

    The Attachment option will convert the Excel file to a CSV file and replace the native file.

    This is a "true" conversion. Rather than making a CSV copy of the Excel file in one way or another, the original Excel file is transformed into a CSV file.

    Seen in this image, the original native file has been converted from an XLSX file to a CSV file.

    • The file's name has changed from "MOCK_DATA_EXCEL_MULTISHEET.xlsx" to "MOCK_DATA_EXCEL_MULTISHEET.csv"
    FYI The original Excel file in this case had two sheets. The Convert option will combine the rows from multiple sheets into a single sheet, one after the other, with a blank row inserted between each sheet.

    Batch Processing Execution: Automating the Conversions

    When automating a Word to PDF or Excel to CSV conversion, you will add the Execute activity to a Batch Process. Once added to a Batch Process, its configuration to convert Word files into PDFs and Excel files into CSVs is the same as described above.

    To add the Execute activity to a Batch Process:

    1. An Execute Batch Process Step needs to be added to a Batch Process.
    2. Configure the Execute step as described above to convert native Word or Excel files for subsequent processing.
      • Add a Word to PDF command for the Microsoft Word object type to convert Word files to PDFs.
      • Add an Excel to CSV command for the Microsoft Excel object type to convert Excel files to CSV.