Skip to main content

Launching Analyses - Crawls

Once your tagging plan configuration is established, it is time to confront it with your site. This is the role of crawls. A crawl is a process during which Netvigie Tracking bots traverse your site to collect data and compare it to the rules you have defined.

Crawl Type vs Crawl Mode: the key distinction

It is essential to fully understand the difference between these two concepts to configure relevant analyses.

  • Crawl Type: Defines the SCOPE of the analysis.
    • It is the answer to the question: "Which pages are we going to test?"
    • Monitoring Crawl: The fastest. It tests only the example URLs you have defined in your contexts and scenarios. Ideal for frequent checks (every hour) of the most critical pages and journeys.
    • Partial Crawl: A good compromise. It tests the monitoring URLs, plus a limited and defined number of other pages of the site that it discovers by following links. Useful for a daily verification that can uncover issues on less central pages.
    • Complete Crawl: The most exhaustive. The bot attempts to explore your entire site by following all the links it finds (within the limits of the filters you have set up). Perfect for weekly or monthly audits.
  • Crawl Mode: Defines the STATE of the browser before the analysis.
    • It is the answer to the question: "What type of user are we simulating?"
    • A crawl mode is a configuration that puts the bot's browser into specific conditions before starting to test the pages.

Configuring Crawl Modes

You can create as many crawl modes as there are user profiles relevant to your tests. Go to Crawls > Crawl Modes.

  • Examples of Crawl Modes:
    • Identified Visitor:
      • Objective: Test the site from the point of view of a logged-in user. Essential for verifying personal data, customer account pages, etc.
      • Configuration: You must create an identification scenario (which opens the login page, enters the email/password and validates). This scenario is then selected as "Preparatory scenario" in the crawl mode configuration.
    • Mobile Crawl:
      • Objective: Simulate a visit from a smartphone to verify tags and mobile-specific behaviors.
      • Configuration: You must first create a mobile type Device (see below), then select it in the crawl mode configuration.
    • Crawl without cookie acceptance:
      • Objective: Verify GDPR compliance by ensuring that non-essential tags do not fire when consent is refused.
      • Configuration: Similar to the identified visitor, you create a scenario that interacts with your cookie banner (CMP) to refuse consent. This scenario becomes the "Preparatory scenario".

Scheduling Crawls

This is where you assemble everything to launch an analysis.

  1. Go to Crawls > Scheduling.
  2. Click on Add a crawl configuration.
  3. Name your crawl (e.g. "Hourly Desktop Monitoring").
  4. Choose the Crawl Type (Monitoring, Partial, Complete).
  5. Select the Crawl Mode(s) to use for this analysis.
  6. Define the frequency (every hour, every day, etc.) or leave it on manual launch.
  7. Alert option: Check the box "This crawl can send alerts" if you wish to be notified in case of an issue detected by this analysis.
  8. Save.

Important: Do not forget to generate a version after modifying your configuration and before launching a crawl, so that your changes are taken into account.

Crawl Components

Several elements allow you to precisely refine the behavior of your crawls.

  • URL Filters (Crawls > URL Filters)
    • Objective: Precisely control which pages the bot is allowed to visit. This is essential to prevent it from getting lost on outbound links or irrelevant parts of your site.
    • Functioning: You create filters based on regular expressions (REGEX).
      • Inclusion filter: The URL must match the REGEX to be crawled. You must have at least one inclusion filter.
      • Exclusion filter: If the URL matches this REGEX, it will be ignored, even if it matches an inclusion filter.
    • Example: Include https://monsite.com/.* and exclude .*\\/blog\\/.* to crawl the entire site except the blog.
  • Devices (Crawls > Devices)
    • Objective: Define the characteristics of the simulated browser.
    • Configuration: You can specify the User-Agent, and the screen size (Width and Height). This is where you will create a "Mobile" device for your mobile crawl mode.
  • Custom headers (Crawls > Custom headers)
    • Objective: Modify or add HTTP headers to requests sent by the bot.
    • Example: Force an Accept-Language header to fr-FR to test a specific version of your site.
  • Resource modifiers (Crawls > Resource modifiers)
    • Objective: Intercept and modify network requests (your tags' "hits") before they are sent.
    • Types:
      • Block the resource: Completely prevent a tag from firing (useful to avoid polluting a partner's statistics during tests).
      • Modify a parameter: Change the value of a parameter in the tag URL (e.g. add a test=true parameter).
      • Modify a header: Change a header of the tag request.

Managing Orphan URLs

  • Definition: An orphan page is a page that exists on your site but is not accessible by any internal link (e.g. a campaign landing page accessible only from an email).
  • Issue: By default, the crawler cannot find it.
  • Solution:
    1. Go to Site > Orphan URLs.
    2. Create a URL group (e.g. "Campaign Landing Pages").
    3. List all orphan URLs in this group.
    4. In your crawl configuration (Scheduling part), select this group so that it is included in the pages to analyze.