Knowledge Source: Timeout Handling with Partial Crawl Preservation | Voters

Knowledge Source: Timeout Handling with Partial Crawl Preservation

complete

Business Problem:
Currently, when a crawl times out before completion, the platform treats the entire crawl as a failure—discarding all content collected up to that point. This leads to unnecessary data loss and a frustrating user experience, especially for large websites or slow-loading pages.
Users are left with no visibility into what was successfully crawled before the timeout, and they must reinitiate the crawl from scratch, wasting time and resources.
Desired Outcome:
Enhance the crawl logic to support graceful handling of timeouts by:
Returning a “Partially Completed” status when the crawl fails midway
Storing and indexing all successfully crawled content up to the point of timeout
This improvement would help users retain partial progress and reduce repeat crawling efforts.

July 29, 2025

marked this post as

complete

⏱️📚 Keep more, lose less! Crawls that time out now save what’s captured and show a Partially Completed status—so your AI keeps learning without wasted effort. Learn more

marked this post as

in progress