Web Archive CLI

Simple Python CLI to archive whole websites to the Web Archive via a sitemap.xml file.


Create a fresh Python virtual env:

python3 -m venv venv

and activate it:

. venv/bin/activate

and install the dependencies:

pip install -r requirements.txt


Activate the Python virtual env:

. venv/bin/activate

Convert a sitemap.xml file to a plain list of URLs:

python sitemap_to_urllist.py sitemap_example.org_2023-01-01.xml

Push all URLs to the web archive:

python do_archive.py urls_example.org_2023-01-01.txt

Note: Strictly follow the scheme with url and date encoded into the file names.


The archive script is based on the savepagenow Python package:

To archive a single URL only, the savepagenow CLI can be used directly:

Wayback API:

Manual paper feed: