On WooTrade, this subject rarely happens "cleanly" on the table. It's rather a chain: redesign, theme migration, addition of a "key-in-hand" filter plugin, and then a marketing request of the type "we want more pages that shake". On the tech side, we quickly see the risk (URLs, rules, monitoring). On the business side, we want collection pages that come out quickly. And between the two, the filters end up creating pages... but without intent, without content, and often without control.
The idea of this article is simple: pose a method SEO-first to structure the catalogue, decide which deserves an indexable page, and lock the rest without finding itself with 50,000 "default" scratched variants.
Essential in 30 seconds
The filters serve the UX. The risk arises when parameters (attributes, sorting, pagination, tracking) become searched URLs and sometimes indexed in mass. The most robust path combines:
(1) a regulated catalogue (own taxonomy), (2) dedicated collection pages for combinations with potential, (3) strict control (noindex/canonic/robots + monitoring logs/GSC).
Your filters are not a problem. The problem is to let them create pages "by accident".
The scenario is classic: the user "filter", everything goes well on screen... but technically, every filter, every sorting, every pagination can produce a separate URL. For Googlebot, it is a machine to generate variants. And when you add marketing parameters (UTM, gclid, fbclid), you get a volume of URLs that has more to do with your offer.
The SEO impact is not always immediate. Often it settles down: exploration that stretches, strategic pages discovered later, diluted signals, and a perf ceiling that eventually appears. Google just reminds us that faceted navigation can generate a lot of URLs and consume crawl resources if it is not controlled.
Since the content between these URLs is very close (same products, just reordered/filted), we find ourselves quickly with duplicate content, cannibalization (several pages "candidates" for the same intention), and a budget crawl consumed on noise rather than on your business pages.
In audits, the objective is not to "blame filters", but to identify where the explosion really occurs.
In logs, which families of Googlebot URLs visit most: filter settings, pagination, sorting, tracking?
In the Search Console, which URL categories go back to Explored – currently unindexed, Duplicate, Alternative with appropriate canonical ?
And on the stack side, which plugin/theme controls URLs (query strings, rewrite, AJAX)?
Examples : /boutique/?filter_couleur=noir&orderby=price, /categorie/chaussures/?page=12, ?utm_source=...
Observable symptoms
We often find the same signals. In the Search Console, an increase in URLs "Discovered but not indexed" or "Explored – currently not indexed", with patterns of the type ?filter_ / ?orderby=. In logs, Googlebot sometimes spends more time on ?page= and combinations of facets only on mother categories. And on the SERP side, they are filtered pages that are positioned, not necessarily well, instead of your categories/collections, which mechanically creates cannibalization.
The logical follow-up therefore consists in taking the subject back to the root: the catalogue structure.
1) Structure a WooCommerce catalogue "SEO-first"
Categories: thinking in intention, not in infinite tree
The common reflex is to stack subcategories: it's reassuring, it's "range". But on arrival, we get weak pages, little differentiated, and a tree that nobody wants to maintain. On the other hand, a performance category behaves more like a Intent page A clear promise, a coherent assortment, and a reason to exist.
If a subcategory does not tell (no selection, no usage, no promise), it is likely to duplicate another page. The Google recommendations on e-commerce structure go in this direction: make your structure understandable and highlight your important pages.
Attributes (taxonomy): the spine of filters
WooCommerce leaves great freedom, which makes the border blurred in practice. However, distinguishing categories and attributes is mainly a question of governance:
- Category : a universe produces + a main intention (e.g. "Chaussures running").
- Attribute : a stable filterable dimension (e.g. Gore-Tex, Drop, Width, Size, Mark).
What makes the difference in everyday life are simple rules: which creates an attribute, how it is called, what values are allowed, and how to avoid duplicates ("Black" vs. "black", "GORE TEX" vs. "Gore-Tex"...). Without this frame, your product data will deteriorate and filters become inconsistent. And this is usually where "accidental pages" begin to proliferate.
Tags : useful... if you have a clear role
Tags can be used for ad hoc selections or editorial needs. But in e-commerce, they often end up in almost empty, ungoverned pages, sometimes indexed "by mistake". If you keep it, the idea is above all to give them a clear role (e.g. "New Things", "Gift ideas") and to avoid making it a second system of categories.
Golden rule: one intention = one target page
As soon as an intention can lead to multiple "plausible" URLs (category, tag, filter, sort...), Google finds itself arbitrating. And in real life, he sometimes chooses the least useful page. The objective is therefore to reach a clear target page (category, collection, landing). The rest serves the UX, but should not become a default SEO candidate.
The tricky moment is when you want a page for each filter. To decide without telling each other stories, we often go back to three questions:
What intention (and what business value)? What demand signals (GSC, PPC, internal search)? and what maintenance cost (content, mesh, indexing rules)?
Example : "Mark" indexable for the 5 top-selling brands, but "Color" reserved for the UX.
After this structuring, one can approach the facets without falling into the debate "AJAX vs not AJAX" — Because the real subject is UX versus SEO.
2) Filters (facets): separate UX and SEO
Why filters create noise
A filter, in the majority of implementations, becomes a d And for a crawler, a parameter = a new potential page. Google document explicitly the fact that faceted browsing can generate a lot of URLs and consume rawl resources, especially when each combination triggers a separate page.
Noise increases further with sorting (orderby=price, orderby=ratingpagination (page=2) and marketing parameters (utm_*, gclid). Even with a good UX, the SEO surface can explode if you do not decide in advance what should exist (and especially what should not become indexable).
A simple grid: universal, differentiating, noise
A pragmatic approach is to classify the facets before even talking about "indexing".
- The facets Universal they are useful everywhere (price, availability, size): they serve l'UX, but rarely give good SEO pages.
- The facets differentiating really change the sense of selection (Gore-Tex, width, type of sole, use): these are the only ones that can sometimes justify dedicated pages.
- The noiseFinally, everything that reorders or segmentes without intent (tri, display, tracking).
"Indexable" does not mean "relevant": how to decide?
A filtered page becomes "candidate SEO" only if it matches very concrete criteria: identifiable demand, sufficient offer, clear promise, possible content, and natural meshing. The Google-side e-commerce best practices insist on access to key pages without drowning exploration in infinite variants.
| Cases | Decision | Example | Treatment |
|---|---|---|---|
| Clear intent + volume + stable offer | SEO candidate | "Gore-Tex men's running shoes" | Page dedicated indexable collection |
| Useful to navigation but too combinatory | UX only | color, size, price | AJAX filter/parameter + non indexable |
| Noise parameter | To be blocked | orderby=, utm_*, gclid | Neutralisation + crawl/index rules |
This matrix sets the framework. Then a production strategy can be carried out that avoids indexed "saved filters" no matter how.
3) The winning strategy: collection pages (landings) + control of the rest
Identify combinations with potential (e.g. "shoe running man gore-tex")
The objective is not to index combinations "in principle". It is often more profitable to select a few dozen, as one would build a range: GSC requests, SEA campaigns, internal search, margin, seasonality, stock. We aim for solid pages, not thousands of fragile variants.
Important point: a collection page must remain consistent even when the stock moves. If it disappears as soon as the assortment fluctuates, you create an unstable SEO page, and it is rarely a good base.
Create dedicated pages: content, template, FAQ, mesh
A "clean" collection page is not just a saved filter. It's a governed page, with a clear intention:
A title aligned with the search (rather than a stack of attributes), a useful short text (selection, criteria, usage), a FAQ that meets objections, and a natural internal mesh from parent categories, guides, brand pages, etc.
It's also a good time to frame production, especially if you're in the process of reshaping or changing themes: it's better to decide now how these pages are born, rather than correcting a debt later. If you're on the job, our WooCommerce development and redesign approaches go in this direction: keep building-side control.
Let filters serve navigation, and prevent indexing of non-strategic
Here, a shade saves time: crawl (exploration) and Indexation (presence in index) are not the same thing. Google reminds us that the meta robot guidelines (noindex, etc.) must be accessible to the robot to be taken into account.
noindex / canonical / robots.txt: when to use what?
Plutôt que de “tout mettre en robots.txt”, on gagne à raisonner par familles d’URLs.
noindexaide à dire “tu peux crawler, mais n’indexe pas”. C’est pratique pour les pages filtrées utiles à l’UX, mais sans valeur SEO. Attention : si l’URL est bloquée via robots.txt, Google peut ne pas voir lenoindex.canonicalconsolide plusieurs variantes vers une URL principale. Mais ce n’est pas une baguette magique : si les pages sont trop différentes, Google peut l’ignorer. Et “canoniser toute la pagination vers la page 1” est un classique… qui crée d’autres problèmes.robots.txtsert surtout à économiser du crawl sur des familles d’URLs sans intérêt. En revanche, ce n’est pas une garantie de non-indexation si des liens externes existent, et cela empêche Google de lire le contenu de ces URLs. Google évoque ce type d’usage pour éviter le crawl des facettes quand on ne veut pas qu’elles apparaissent dans la recherche.
Les arbitrages UX/SEO les plus “propres” sont souvent ceux qui restent simples à expliquer :
quelles facettes restent crawlables mais noindex (UX prioritaire) ? quelles familles sont en disallow robots.txt (bruit pur) ? et où est la frontière entre “collection SEO” et “filtre UX” (et qui valide) ?
Example : tri et tracking en disallow, facette prix en noindex, collections “Marque + Usage” en pages dédiées.
4) Mise en œuvre dans WooCommerce
Natif WooCommerce (layered navigation) : utile, mais pas “stratégique” tout seul
La layered navigation native fait le job côté UX. En revanche, elle ne porte pas, à elle seule, une stratégie SEO : sans règles, elle expose vite des URLs filtrées crawlables, des combinaisons infinies, et des templates qui se ressemblent.
Avant de changer de plugin, il est souvent plus rentable de cadrer : format d’URL, règles d’indexation, canonicals, monitoring. Dans ce contexte, un audit technique SEO & performance WordPress est souvent le point de départ le plus rentable quand on soupçonne une explosion d’URLs.
Pattern A (souvent le plus stable) : filtres UX + pages collections statiques indexables
C’est un pattern qui tient bien dans le temps : les filtres servent à naviguer (AJAX ou paramètres), et les pages collections “gouvernées” servent le SEO. Marketing y gagne en autonomie (une page collection se travaille comme une page), et l’IT gagne en testabilité (moins de surprises, règles plus simples).
Pattern B : facettes indexables via règles strictes + monitoring (plus risqué)
C’est faisable, mais plus exposé : il faut limiter sévèrement les combinaisons, imposer un ordre d’URL stable, contrôler pagination/tri, et monitorer en continu (logs, alerting, correctifs). Sans gouvernance, c’est typiquement le genre de choix qui semble “OK” le jour 1… et qui réapparaît en prod six mois plus tard.
Points de vigilance : canonical, pagination, tri, tracking
Sur la pagination, Google recommande que chaque page paginée ait sa propre URL et son canonical propre (plutôt que de tout canoniser vers la page 1).
Sur le tri et le tracking, l’objectif est simple : éviter que ces paramètres deviennent des pages candidates.
5) Checklist de validation avant mise en prod
Indexation maîtrisée : combien d’URLs indexables vous visez ?
Avant déploiement, c’est souvent utile de poser un chiffre-cible : “on veut X pages indexables” (catégories + collections), plutôt que “on verra bien”. Ensuite, on vérifie que les règles techniques servent ce chiffre.
Avant : inventaire des facettes, décision (SEO candidate / UX only / à bloquer), liste des paramètres à neutraliser, mapping des pages collections.
Pendant : tests sur un environnement proche prod (canonicals, meta robots, pagination, tri), vérification des sitemaps (collections incluses, facettes exclues).
Après : contrôle GSC (couverture, exclusions, indexation), échantillon de logs Googlebot, et correctifs rapides si dérive.
Crawl & logs : éviter que Googlebot “dépense” au mauvais endroit
Les logs sont un détecteur de fumée très fiable. Le but est de voir Googlebot majoritairement sur vos catégories/collections, pas sur ?orderby= or ?utm_. Google insiste sur le coût potentiel du crawl des facettes : si tout reste ouvert, vous payez (ressources serveur + opportunité SEO).
Performance : filtres rapides, pages légères
Des filtres lents poussent à surcharger en JS, multiplient les requêtes, et compliquent l’exploration. L’objectif est une UX fluide et des pages stables. Quand un plugin ajoute des dépendances partout, ce n’est pas forcément “mauvais”, mais c’est souvent un signal : le pattern (ou l’intégration) mérite d’être re-challengé.
Conclusion : reprendre le contrôle, sans complexifier
Les filtres ne sont ni “bons” ni “mauvais” pour le SEO. Tout se joue dans la gouvernance : décider quelles intentions méritent une vraie page, et empêcher le reste de devenir indexable par accident.
Sur WooCommerce, les projets qui performent sont rarement ceux qui ont “le plus de facettes”, mais ceux qui ont fait des choix clairs — côté SEO, UX et technique.
Vous voulez vérifier si votre catalogue est sous contrôle ?
Prenons 30 minutes pour en parler !
FAQ
Souvent oui, mais rarement “en bloc”. noindex est particulièrement utile pour les pages filtrées qui aident la navigation mais n’ont pas de valeur SEO, tout en gardant des pages de collection dédiées pour les intentions à potentiel. Google détaille ces directives (dont noindex) et leur usage.
robots.txt agit surtout sur le crawl (économie d’exploration), noindex sur l’Indexation. Une URL bloquée en robots.txt peut empêcher Google de voir le noindex, puisqu’il ne crawl pas la page. C’est pour ça qu’on combine souvent : disallow pour le bruit pur, noindex pour l’UX-only.
For pagination: one URL per page paginated + canonical clean (avoid canonizing systematically to page 1).
For sorting: treat it like noise, and avoid these variants becoming indexable.
Ils peuvent aider à éviter la création d’URLs indexables, mais ce n’est pas automatique. L’important reste : quelles URLs existent, lesquelles sont indexables, et comment vous surveillez la dérive (Search Console + logs).
