Internet Utilities¶
lazy_downloader
¶
Automates downloading plain text files from the Web.
Lazy Downloader¶
As implemented currently, it will only correctly handle plain text;
however, there are plans to implement the mimetype
module and
properly handle a much wider range of files.
Both parameters, url
and output_fname
are required parameters.
Safety Features¶
If the filename already exists on the system it will NOT be overwritten, and the script will safely exit.
Setting User Options¶
This module is a perfect candidate for collections.ChainMap()
.
We could check env vars, config files, command line args and user
provided parameters and rank them in that order of importance when
configuring the download.
-
pyutil.lazy_downloader.
output_fname
¶ A path to write the downloaded content to. Defaults to the last section of the URL when split by forward slashes, or /.
- Type
str, optional
-
pyutil.lazy_downloader.
_get_page
(URL)[source]¶ Get the content at
URL
.Returns content if it is recognized HTML/XML. If not, return
None
.
-
pyutil.lazy_downloader.
_parse_site
(URL, **kwargs)[source]¶ Parse the given
URL
, remove tags and return plaintext.This should probably be modified to take the user agent and header args.
-
pyutil.lazy_downloader.
_parse_url
(URL)[source]¶ Parse the url in order to get something usable if we don’t get a fname.
If no output filename is given don’t crash!
-
pyutil.lazy_downloader.
check_response
(server_response)[source]¶ Check that the headers sent by the server exist and are 200.