The web contains trillions and trillions of pages, which can make it hard for your customers to find your store at times. We rely heavily on Google to find things for us and Google is able to do this with something called a Web Crawler.
Stores with a high number of unique pages compared to their traffic will tend to have a low hit rate and as a result caching solutions are not effective. The Magento Cache Crawler solves this by crawling and pre-caching the pages on your site, which will increase the cache hit rate
Here are some more great features of the Cache Warmer/Crawler Extension
Efficiently Crawl Your Magento Store to Warm Your Caches
The crawler will only access each page one time during a crawl session to ensure there is no duplicate effort. In addition, pages your users cannot access (disabled products, children of configurable products) will not be crawled.
Crawl All Your Important Pages
Crawl CMS pages, catalog pages, and product pages. When used with the Full Page Cache, all your layered navigation filter options, deep catalog pages, searches, and 3rd party extensions will be crawled without any further configuration.
Ensure That Server Load on the Server Does Not Become High
A crawler increases server load in exchange for faster page speed. If there are lots of things happening on the server, then it can create high server load. The crawler can detect this and automatically pause itself and automatically restart when the server load reduces. This ensures that the crawler does not compete with user traffic for server resources!
Crawl as Quickly as Possible Using Multiple Crawler Threads
Multiple threads will be used which can reduce the total crawl time by 10 times or more.
Crawl as Logged in Users
Do you show different content to logged in users? If so, it might make sense to crawl as a logged in user so their cache can be warmed. You can now easily select which customer group you want to crawl.
Crawl Out-of-Schedule Jobs By Manually Adding It
Did you flush the cache and need to run a job real quick even though it is not scheduled yet? No problem, you can easily add a manual job that will run immediately.
Don’t Crawl Disabled / Hidden / Out-of-Stock Products
Ensure you don’t waste resources crawling products that would be infrequently viewed or not viewable at all!
Filter Out Generated URLs by URL Type
Easily filter out certain pages such as category pages or cms pages or product pages from being added to the queue. This is useful if you only want to crawl category pages or certain page types.
Always View the Status of Crawls and Manage Crawls Using an Easy Interface
Easily view past crawls, currently running crawls, and queued crawls. Pause or disable crawls. Change the number of threads used. View the number of crawled URLs, the source of the URLs, and view performance metrics such as the number of crawled URLs per minute.
Filter Out any URL by Regular Expression
Easily ensure that URLs that match a regular expression will not be crawled. This is useful if you do not want certain pages to be crawled.
Fully Support Crawling of HTTPS Secure URLs
Is your site running in secure mode on the frontend? The crawler will crawl your secure pages too.
Reduces Bandwidth and CPU Usage During Crawls
When combined with our Full Page Cache there will be reduced bandwidth and processing done when loading uncached pages loaded by the crawler process.
Flush Caches Prior to Crawling
Flushing Magento caches prior to crawling is important to ensure that the crawler crawled pages are the most up-to-date.