Web Crawl

The Web Crawl component allows you to extract content from multiple web pages simultaneously. You can create dynamic URL lists by combining multiple text inputs with a template, similar to the Text Aggregator component.

Credit Cost

Depends on the content of the pages. Crawling the introduction page costs 5 credits, for reference.

Usage

The Web Crawl component has multiple input handles that accept text data, and a single output handle that produces the crawled content in markdown format. You can connect any number of text variables to the input handles and use them in your URLs template using the {{variable}} syntax.

Variable Handling

Variables must be explicitly referenced in the URLs template to be used. Simply connecting a variable to the input handle is not enough - you must use the {{variable}} syntax in the template to include its value. Any connected variable that is not referenced in the URLs template will be ignored.

If a referenced variable contains empty data, that variable will be replaced with an empty string in the URLs.

Properties

URLs

Type: text
Description: A template that must evaluate to a valid JSON array of URLs. Use {{variable}} syntax to reference input variables.
Default: Empty template

Output Format

The component outputs the raw content of all crawled pages in markdown format. The content is processed to:

Convert HTML to markdown
Preserve text formatting
Include headers and lists
Maintain links
Remove unnecessary styling

Examples

For input variables:

domain = "docs.example.com"
product = "widget"

URLs Template:

[
  "https://{{domain}}/{{product}}/overview",
  "https://{{domain}}/{{product}}/features"
]

This will crawl both URLs and return their content in markdown format.

Important Notes

The URLs template must evaluate to a valid JSON array of strings
All URLs must be valid and accessible
Some websites may block or rate-limit crawling
The component respects robots.txt rules

Credit Cost​

Usage​

Variable Handling​

Properties​

URLs​

Output Format​

Examples​

Important Notes​