130 Commits

Author SHA1 Message Date
Mike Fährmann
b17a5d6f3b give downloader classes proper names 2018-11-16 14:40:05 +01:00
Mike Fährmann
4a348990f4 adjust value resolution for retries/timeout/verify options
This change introduces 'extractor.*.retries/timeout/verify' options
as a general way to set these values for all HTTP requests.

'downloader.http.retries/timeout/verify' is a way to override these
options for file downloads only and will fall back to 'extractor.*.…*
values if they haven't been explicitly set.

Also: downloader classes now take an extractor object as first argument
instead of a requests.session.
2018-10-07 21:13:39 +02:00
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
ebe9b0a04c another attempt at downloader retry behavior
This commit changes the general behavior from
'Retry on every exception and abort on DownloadError' to
'Only retry on DownloadRetry exceptions and abort on every other one'

The previous version would have retried on several states which
would have no chance of ever succeeding (invalid URLs, etc.)
2017-12-07 15:31:14 +01:00
Mike Fährmann
8f518e03f8 add options to set maximum download rate
- -r/--limit-rate as cmdline option
- downloader.http.rate as config option

This implementation very roughly uses the idea of the token bucket
algorithm [1] and mostly uses Wget's approach [2] as inspiration.

[1] https://en.wikipedia.org/wiki/Token_bucket
[2] http://git.savannah.gnu.org/cgit/wget.git/tree/src/retr.c?h=v1.19.2&id=ba6b44f6745b14dce414761a8e4b35d31b176bba#n111
2017-12-02 01:47:26 +01:00
Mike Fährmann
3dc1169736 use own mapping before relying on the 'mimetypes' module 2017-12-01 13:50:31 +01:00
Mike Fährmann
79bcaa8726 improve downloader retry behavior
- only retry download on 5xx and 429 status codes
- immediately fail on 4xx status codes
2017-11-10 21:46:18 +01:00
Mike Fährmann
707b15b586 create missing directories for 'part-directory'
also some code improvements regarding downloader config values
2017-10-27 12:22:45 +02:00
Mike Fährmann
caf26412dd add option to set alternate location of .part files (#29)
Note: The path set for 'downloader.*.part-directory' needs to point to an
already existing directory.
2017-10-26 00:16:48 +02:00
Mike Fährmann
963670d73b add options to control usage of .part files (#29)
- '--no-part' command line option to disable them
- 'downloader.http.part' and 'downloader.text.part' config options

Disabling .part files restores the behaviour of the old downloader
implementation.
2017-10-24 23:33:44 +02:00
Mike Fährmann
b0353aa02d rewrite download modules (#29)
- use '.part' files during file-download
- implement continuation of incomplete downloads
- check if file size matches the one reported by server
2017-10-24 12:53:03 +02:00
Mike Fährmann
2e982f56af use 'Content-Length' to determine incomplete downloads (#29) 2017-10-20 18:56:18 +02:00
Mike Fährmann
b8862ff15e add 'downloader.http.verify' option
(also: change the default 'timeout' from None to 30)
2017-08-31 15:21:08 +02:00
Mike Fährmann
58e95a7487 share extractor and downloader sessions
There was never any "good" reason for the strict separation
between extractors and downloaders. This change allows for
reduced resource usage (probably unnoticeable) and less lines
of code at the "cost" of tighter coupling.
2017-06-30 19:38:14 +02:00
Mike Fährmann
fac6c02224 [downloader] fix extension from content-type 2017-06-19 09:24:00 +02:00
Mike Fährmann
48a5b11204 fix error if no file extension is found 2017-04-26 12:31:42 +02:00
Mike Fährmann
e3212dd98f fix some smaller stuff
- remove support for old windows config paths
- catch exception if cache-database can't be opened
- fix username/password settings for unit tests
- rename variable 'max_tries' to 'retries'
2017-03-27 14:30:32 +02:00
Mike Fährmann
e2b5cd9918 change config-path for 'retries' and 'timeout' 2017-03-26 18:24:46 +02:00
Mike Fährmann
0b5076815d always delete incompletely downloaded files 2017-03-21 15:53:43 +01:00
Mike Fährmann
22910f9562 improve error handling of http file downloads
(#10)
2017-03-16 04:17:35 +01:00
Mike Fährmann
4f123b8513 code adjustments according to pep8 2017-01-30 19:40:15 +01:00
Mike Fährmann
3c1daef839 don't delete downloaded files in certain edge cases 2016-11-27 23:43:25 +01:00
Mike Fährmann
2b2bdce366 don't raise an exception if a download fails (#5) 2016-11-23 13:07:44 +01:00
Mike Fährmann
dd8236e733 enable non-standard MIME types 2016-09-30 16:41:49 +02:00
Mike Fährmann
29692c5784 get extension from Content-Type header if not provided 2016-09-30 12:32:48 +02:00
Mike Fährmann
4b377ccc09 use output-module during downloads 2015-12-01 21:22:58 +01:00
Mike Fährmann
28fa7c53b4 docstrings and other small fixes for downloaders 2015-04-10 21:45:41 +02:00
Mike Fährmann
5545624da1 use seperate session in http downloader 2015-04-10 19:19:12 +02:00
Mike Fährmann
cd4a699dd2 add 'Headers' and 'Cookies' message 2015-04-08 19:06:50 +02:00
Mike Fährmann
deef91eddc initial commit 2014-10-12 21:56:44 +02:00