n8n workflow to Extract & Process Specific Links from sitemap.xml
This workflow reads a sitemap.xml file, extracts all URLs, and allows you to filter out specific types of links—such as PDF files, images, or any other content—based on your needs.
Description
This workflow reads a sitemap.xml
file, extracts all URLs, and allows you to filter out specific types of links—such as PDF files, images, or any other content—based on your needs.
Who Is This For?
- SEO Specialists looking to analyze specific URLs in their sitemap.
- Developers who need to extract links for automated processing.
- Content Managers filtering out downloadable assets like PDFs or images.
How It Works
- Fetch
sitemap.xml
– The workflow reads the sitemap file from a given URL. - Extract URLs – Parses all the URLs listed in the sitemap.
- Filter URLs – Use a simple filter to extract only the links you need (e.g., *.pdf).
- Export or Process – The filtered list can be sent via email, stored in a database, or used in another workflow.
Customization
- Edit the Set sitemap URL block and edit the
sitemapUrl
value to the sitemap you want to fetech. - Edit the Filter URLs block and edit the filter contitions to meet your needs.