Internet Utilities

lazy_downloader

Automates downloading plain text files from the Web.

Lazy Downloader

As implemented currently, it will only correctly handle plain text; however, there are plans to implement the mimetype module and properly handle a much wider range of files.

Both parameters, url and output_fname are required parameters.

Safety Features

If the filename already exists on the system it will NOT be overwritten, and the script will safely exit.

Setting User Options

This module is a perfect candidate for collections.ChainMap(). We could check env vars, config files, command line args and user provided parameters and rank them in that order of importance when configuring the download.

pyutil.lazy_downloader.url

A url to download

Type

str

pyutil.lazy_downloader.output_fname

A path to write the downloaded content to. Defaults to the last section of the URL when split by forward slashes, or /.

Type

str, optional

pyutil.lazy_downloader._get_page(URL)[source]

Get the content at URL.

Returns content if it is recognized HTML/XML. If not, return None.

pyutil.lazy_downloader._parse_arguments()[source]

Parse user input.

pyutil.lazy_downloader._parse_site(URL, **kwargs)[source]

Parse the given URL, remove tags and return plaintext.

This should probably be modified to take the user agent and header args.

Parameters

URL (str) – Page to download.

Returns

txt – Plaintext view of the website.

Return type

str

pyutil.lazy_downloader._parse_url(URL)[source]

Parse the url in order to get something usable if we don’t get a fname.

If no output filename is given don’t crash!

Parameters

URL (str) – A live URL to download a page from

Returns

stripped_url – A URL that’s been split on the / symbols.

Return type

list

pyutil.lazy_downloader.check_response(server_response)[source]

Check that the headers sent by the server exist and are 200.

Search body of text for URLs.

Parameters

text (str) – Body of formatted text to search for URLs.

Returns

links – URLs found on site.

Return type

todo

pyutil.lazy_downloader.main()[source]

Download URL and write to disk.