Business Problem:
Currently, when a crawl times out before completion, the platform treats the entire crawl as a failure—discarding all content collected up to that point. This leads to unnecessary data loss and a frustrating user experience, especially for large websites or slow-loading pages.
Users are left with no visibility into what was successfully crawled before the timeout, and they must reinitiate the crawl from scratch, wasting time and resources.
Desired Outcome:
Enhance the crawl logic to support graceful handling of timeouts by:
  • Returning a “Partially Completed” status when the crawl fails midway
  • Storing and indexing all successfully crawled content up to the point of timeout
This improvement would help users retain partial progress and reduce repeat crawling efforts.