HTTP Import (Import Provider): Difference between revisions

From Grooper Wiki
No edit summary
Line 25: Line 25:
=== Use "Link Selectors" to import multiple site pages with HTTP Import ===
=== Use "Link Selectors" to import multiple site pages with HTTP Import ===


== Batch Process considerations - Conditioning HTML documents ==
== Batch Process considerations ==
 
 
=== Conditioning HTML documents ===

Revision as of 12:41, 3 April 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

temp

About

HTTP Import is used to import web content into Grooper Batches. It can be used to import the following from an HTTP server:

  • Individual web pages (HTML documents)
  • Files hosted on web servers, including PDFs hosted on websites.
  • Entire websites

How does it work?

HTTP Import will bring in one or more web pages based on how the provider is configured. This configuration will determine how Grooper navigates pages on the website. One Grooper document is created for each distinct URL. Each web page is imported as a Batch Folder with an HTML file as its primary attachment. For URLs that resolve to files (such as PDFs), the file is imported as a Batch Folder and is its primary attachment.

How is it configured?

The HTTP Import configuration involves setting a "Source" where either a single web page or the URL to the website's root is defined. When a website's root is defined, one or more relative URLs are added to the "Relative Page URLs" to specify which pages to include in the import. Furthermore, HTTP Import will traverse links on a web page to import linked pages when a "Link Selector" is configured.

How To

Import a single web page with HTTP Import

Use Relative Page URLs to import multiple site pages with HTTP Import

Use "Link Selectors" to import multiple site pages with HTTP Import

Batch Process considerations

Conditioning HTML documents