Collectors - Extracting Data
What is a collector and what is it used for?
A collector is a tool whose function is to fetch and extract specific information on a web page. This information can be found anywhere: in the URL, in the HTML code (via a CSS selector), in a cookie, in the DataLayer, or even be the result of a Javascript script.
Collectors are the foundation of the entire configuration in Netvigie Tracking. They are essential to:
- Define context rules: "A page is a 'Product Page' if the collector
{{URL}}contains/produit/". - Verify values in the tagging plan: "The parameter
product.priceof the dataLayer must be equal to the value of the collector{{Prix Affiché}}". - Identify sensitive data: "The collector
{{User Email}}retrieves a piece of sensitive data". - Condition rules (constraints): "Only verify this parameter if the collector
{{User Status}}returns 'connected'".
The different types of Collectors
-
CSS
- Objective: Retrieve the text content of an HTML element targeted by a CSS selector.
- Example: To retrieve the price "18,99 €" contained in
<span class="price">18,99 €</span>, the selector will be.price. - Advanced options:
- Attribute: Retrieve the value of a tag attribute (e.g. retrieve the value of
srcin an<img>tag). - Regex / Replacement: Extract a part of the text or reformat it (e.g. keep only the numeric part
18.99). - Return the number of elements: Count how many elements match the selector.
- Return an array: If the selector matches multiple elements, retrieve all their values in a list.
- Attribute: Retrieve the value of a tag attribute (e.g. retrieve the value of
-
DataLayer
- Objective: Retrieve the entire DataLayer object as it is constituted on the page. It is a prerequisite for any verification of the DataLayer tagging plan.
- Configuration: It is usually sufficient to indicate the name of the dataLayer JavaScript variable (e.g.
dataLayerfor GTM,tc_varsfor TagCommander). - Advanced options (for GTM): You can choose not to merge dataLayer events or to freeze its state at a specific moment of the page load (e.g. at
gtm.dom), which is useful for advanced debugging.
-
DataLayer Element
- Objective: Retrieve the value of a specific variable inside the DataLayer.
- Example: For a dataLayer
{ page: { name: 'product_detail' } }, you can create a collector to extract the value ofpage.name. The result will beproduct_detail.
-
URL
- Objective: Extract a part of the URL of a page.
- Configuration: You choose which property of the URL interests you (
href,pathname,hostname, etc.) and can apply a regular expression (REGEX) to extract a sub-part. - Example: To extract the product ID
52from the URL.../p/52-121-t-shirt.html, you can use a REGEX like\\/p\\/([0-9]+)-.
-
URL Parameter
- Objective: Retrieve the value of a specific parameter in the URL query string (the part after the
?). - Example: In the URL
.../recherche?controller=search&s=webcam, to retrieve the searched term, the parameter name to fill in iss. The collector will return the valuewebcam.
Important note: The parameter name is the string located before the = sign (here, s), and its value is what follows (here, webcam).
- Objective: Retrieve the value of a specific parameter in the URL query string (the part after the
-
Cookie
- Objective: Retrieve the value of a cookie, or an element of
localStorageorsessionStorage. - Configuration: You specify the storage type (Cookie, Local, Session) and the name of the element to retrieve.
- Example: Retrieve the value of the Google Analytics
_gacookie.
- Objective: Retrieve the value of a cookie, or an element of
-
Javascript
-
Objective: Execute a custom JavaScript function and return the result. This is the most powerful and flexible collector.
-
Example: A script that checks for the presence of a login element and returns
"connecté"or"déconnecté".if (document.querySelector("#logout-button")) {
return "connecté";
} else {
return "déconnecté";
}
-
-
Tablemap
- Objective: Create a mapping table to translate the value of another collector.
- Example: If a collector returns
index,product,cart, you can use a Tablemap to translate them intohomepage,product page,cart.
-
Data Set (Dataset)
- Objective: Import a CSV file (like an external tagging plan) and use it as a data source.
- Functioning: You import a CSV. The collector can then, for the current page, find the corresponding row in the CSV (based on a key column, like the URL) and return all values of this row. This allows for verifying complex tagging plans managed externally.
Options common to collectors
- Numeric output: Forces the returned value to be a number (by removing letters, symbols, etc.). Very useful for comparing prices.
- Temporary collector: Indicates that the collector value does not need to be stored after the crawl. Useful for collectors that only serve to define contexts.
- Sensitive data: Marks the extracted data as potentially personal (email, name, etc.). This triggers specific checks in the GDPR section to ensure it is not sent to unauthorized third parties.