Skip to main content

Collectors - Extracting Data

What is a collector and what is it used for?

A collector is a tool whose function is to fetch and extract specific information on a web page. This information can be found anywhere: in the URL, in the HTML code (via a CSS selector), in a cookie, in the DataLayer, or even be the result of a Javascript script.

Collectors are the foundation of the entire configuration in Netvigie Tracking. They are essential to:

  • Define context rules: "A page is a 'Product Page' if the collector {{URL}} contains /produit/".
  • Verify values in the tagging plan: "The parameter product.price of the dataLayer must be equal to the value of the collector {{Prix Affiché}}".
  • Identify sensitive data: "The collector {{User Email}} retrieves a piece of sensitive data".
  • Condition rules (constraints): "Only verify this parameter if the collector {{User Status}} returns 'connected'".

The different types of Collectors

  • CSS

    • Objective: Retrieve the text content of an HTML element targeted by a CSS selector.
    • Example: To retrieve the price "18,99 €" contained in <span class="price">18,99 €</span>, the selector will be .price.
    • Advanced options:
      • Attribute: Retrieve the value of a tag attribute (e.g. retrieve the value of src in an <img> tag).
      • Regex / Replacement: Extract a part of the text or reformat it (e.g. keep only the numeric part 18.99).
      • Return the number of elements: Count how many elements match the selector.
      • Return an array: If the selector matches multiple elements, retrieve all their values in a list.
  • DataLayer

    • Objective: Retrieve the entire DataLayer object as it is constituted on the page. It is a prerequisite for any verification of the DataLayer tagging plan.
    • Configuration: It is usually sufficient to indicate the name of the dataLayer JavaScript variable (e.g. dataLayer for GTM, tc_vars for TagCommander).
    • Advanced options (for GTM): You can choose not to merge dataLayer events or to freeze its state at a specific moment of the page load (e.g. at gtm.dom), which is useful for advanced debugging.
  • DataLayer Element

    • Objective: Retrieve the value of a specific variable inside the DataLayer.
    • Example: For a dataLayer { page: { name: 'product_detail' } }, you can create a collector to extract the value of page.name. The result will be product_detail.
  • URL

    • Objective: Extract a part of the URL of a page.
    • Configuration: You choose which property of the URL interests you (href, pathname, hostname, etc.) and can apply a regular expression (REGEX) to extract a sub-part.
    • Example: To extract the product ID 52 from the URL .../p/52-121-t-shirt.html, you can use a REGEX like \\/p\\/([0-9]+)-.
  • URL Parameter

    • Objective: Retrieve the value of a specific parameter in the URL query string (the part after the ?).
    • Example: In the URL .../recherche?controller=search&s=webcam, to retrieve the searched term, the parameter name to fill in is s. The collector will return the value webcam.

    Important note: The parameter name is the string located before the = sign (here, s), and its value is what follows (here, webcam).

  • Cookie

    • Objective: Retrieve the value of a cookie, or an element of localStorage or sessionStorage.
    • Configuration: You specify the storage type (Cookie, Local, Session) and the name of the element to retrieve.
    • Example: Retrieve the value of the Google Analytics _ga cookie.
  • Javascript

    • Objective: Execute a custom JavaScript function and return the result. This is the most powerful and flexible collector.

    • Example: A script that checks for the presence of a login element and returns "connecté" or "déconnecté".

      if (document.querySelector("#logout-button")) {
      return "connecté";
      } else {
      return "déconnecté";
      }

  • Tablemap

    • Objective: Create a mapping table to translate the value of another collector.
    • Example: If a collector returns index, product, cart, you can use a Tablemap to translate them into homepage, product page, cart.
  • Data Set (Dataset)

    • Objective: Import a CSV file (like an external tagging plan) and use it as a data source.
    • Functioning: You import a CSV. The collector can then, for the current page, find the corresponding row in the CSV (based on a key column, like the URL) and return all values of this row. This allows for verifying complex tagging plans managed externally.

Options common to collectors

  • Numeric output: Forces the returned value to be a number (by removing letters, symbols, etc.). Very useful for comparing prices.
  • Temporary collector: Indicates that the collector value does not need to be stored after the crawl. Useful for collectors that only serve to define contexts.
  • Sensitive data: Marks the extracted data as potentially personal (email, name, etc.). This triggers specific checks in the GDPR section to ensure it is not sent to unauthorized third parties.