Commit Graph

74 Commits

Author SHA1 Message Date
Mike Fährmann
f5604492c3 update interface of config functions 2019-11-24 00:42:28 +01:00
Mike Fährmann
bbbeff4c41 [downloader.http] implement file-specific HTTP headers 2019-11-19 23:50:54 +01:00
Mike Fährmann
a5be08a830 [downloader:ytdl] forward proxy settings 2019-11-05 16:16:26 +01:00
Mike Fährmann
d44f790e81 adjust output for HTTP status related errors 2019-10-27 23:55:02 +01:00
Mike Fährmann
083e14ad9a [downloader:ytdl] add data from '_ytdl_extra' to info_dicts 2019-10-25 13:17:13 +02:00
Mike Fährmann
1032cfa34b [downloader:http] extend mimetype map with archive formats 2019-10-10 18:30:23 +02:00
Mike Fährmann
8eaae58045 [downloader:http] change log message level to 'debug' 2019-08-29 23:05:47 +02:00
Mike Fährmann
7c09545f70 [downloader:ytdl] add 'outtmpl' option (#395) 2019-08-24 22:47:59 +02:00
Mike Fährmann
ebabc5caf1 [downloader:http] treat 416 without downloaded data as error
Downloading https://pbs.twimg.com/media/EB2cGUYX4AI2Vuu.jpg:orig (NSFW)
sometimes returns a 416 status code, even though no 'Range' header was
sent and no data was downloaded prior.
This code usually means a file has already been downloaded completely
and the download method indicates success, but in this case it causes
an exception down the pipeline since no file was created.
2019-08-20 00:15:17 +02:00
Mike Fährmann
0bb873757a update PathFormat class
- change 'has_extension' from a simple flag/bool to a field that
  contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306)
2019-08-12 21:40:37 +02:00
Mike Fährmann
b7fb93e2b2 [downloader:http] add 'adjust-extensions' option 2019-08-08 16:54:20 +02:00
Mike Fährmann
547ea71463 [downloader.ytdl] add 'forward-cookies' option (#352)
The "long" name is necessary because just calling it 'cookies' would
clash with how the lookup for '--cookies' is implemented.
2019-07-24 21:19:11 +02:00
Mike Fährmann
c41ff9441e improve find() for downloaders and postprocessors 2019-07-15 16:33:03 +02:00
Mike Fährmann
16c582aaf9 implement 'mtime' post-processor (#332)
This can set a file's modification time according to a UNIX timestamp
or a datetime object from its metadata.
2019-07-14 22:39:17 +02:00
Mike Fährmann
8966930c5c [downloader:http] try to import SSL exception class from OpenSSL
(#324)
2019-07-01 20:10:26 +02:00
Mike Fährmann
69205df68d allow '-1' for infinite retries (#300) 2019-06-30 23:10:47 +02:00
Mike Fährmann
f7b5c4c3e7 use values of 'retries' options correctly
The RE-tries option now specifies exactly that: the maximum number a
failed HTTP request is re-tried. For example a value of 2 will now
correctly stop after 3 attempts: the initial one + 2 re-tries.

The maximum wait-time now also caps at 30min and increases exponentially
for both extractor.request() and downloader.http.download().
2019-06-30 23:10:18 +02:00
Mike Fährmann
f1b0c2bf5c [downloader:ytdl] forward cookies to youtube-dl
to be able to download private videos from Twitter, Instagram, etc.
2019-06-26 19:32:07 +02:00
Mike Fährmann
db3f52881a add 'mtime' option 2019-06-20 17:19:44 +02:00
Mike Fährmann
ee4d7c3d89 update downloader.find() and related code
Instead of replacing 'https' with 'http' for every URL in
'get_downloader()', this now only happens once during downloader
initialization. Also unit tests.
2019-06-20 16:59:44 +02:00
Mike Fährmann
f4ba98771d use Last-Modified header to set file modification time
(#236, #277)
2019-06-19 23:16:32 +02:00
Mike Fährmann
179d112083 [downloader] overhaul http and text modules
Get rid of the modular structure and simplify/specialize those modules.
2019-06-19 22:56:11 +02:00
Mike Fährmann
6da3e21237 [downloader:ytdl] provide 'filename' metadata (closes #291) 2019-05-31 14:56:45 +02:00
Mike Fährmann
7973419b54 restrict downloader and postprocessor module imports 2019-04-16 18:09:30 +02:00
Mike Fährmann
114b8eecc5 [downloader;ytdl] utilize '_ytdl_index' metadata fields 2019-03-24 11:27:20 +01:00
Mike Fährmann
c14d44e1bc [downloader:common] retry downloads on SSL errors (#130) 2018-12-14 16:33:04 +01:00
Mike Fährmann
b17a5d6f3b give downloader classes proper names 2018-11-16 14:40:05 +01:00
Mike Fährmann
655549df7c [downloader:ytdl] add several options
The "default" downloader options (rate, retries, timeout, verify) are
mapped to corresponding youtube-dl options.

downloader.ytdl.logging tells the downloader to pass youtube-dl's output
to a Logger object.

downloader.ytdl.raw-options allows to pass arbitrary options to the
YoutubeDL constructor.
2018-10-20 18:26:49 +02:00
Mike Fährmann
4a348990f4 adjust value resolution for retries/timeout/verify options
This change introduces 'extractor.*.retries/timeout/verify' options
as a general way to set these values for all HTTP requests.

'downloader.http.retries/timeout/verify' is a way to override these
options for file downloads only and will fall back to 'extractor.*.…*
values if they haven't been explicitly set.

Also: downloader classes now take an extractor object as first argument
instead of a requests.session.
2018-10-07 21:13:39 +02:00
Mike Fährmann
188876d814 implement youtube-dl downloader module
URLs starting with 'ytdl:' will now be handled by youtube-dl.
There is probably a lot to fix and improve, but the basic use case
works.

TODO:
- format selection and ytdl options in general
- better filename/path handling
- ytdl support for "unsupported URLs"
- ...
2018-10-05 18:05:11 +02:00
Mike Fährmann
e9ae6fd080 improve downloader/postprocessor module loading
- handle arguments of any type without propagating an exception
- prevent potential security risk through relative imports
2018-09-05 16:39:40 +02:00
Mike Fährmann
973cf98e88 fix download skip for files without extension 2018-06-27 17:16:07 +02:00
Mike Fährmann
821535b458 adjust PathFormat class 2018-06-06 20:17:17 +02:00
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
1d54a8e07d fix logging output during downloads
from:
filename.ext[download][warning] ...

to:
filename.ext
[download][warning] ...
2018-03-01 18:43:43 +01:00
Mike Fährmann
915807dd77 log HTTP errors as warnings 2018-01-29 21:55:46 +01:00
Mike Fährmann
f94e3706a8 use logging module for error messages during downloads 2018-01-26 18:11:13 +01:00
Mike Fährmann
b837420291 fix minor urllist issues 2018-01-19 22:54:15 +01:00
Mike Fährmann
6174a5c4ef [download] adjust filename extension on filetype mismatch
(closes #63)
2018-01-17 18:37:06 +01:00
Mike Fährmann
ebe9b0a04c another attempt at downloader retry behavior
This commit changes the general behavior from
'Retry on every exception and abort on DownloadError' to
'Only retry on DownloadRetry exceptions and abort on every other one'

The previous version would have retried on several states which
would have no chance of ever succeeding (invalid URLs, etc.)
2017-12-07 15:31:14 +01:00
Mike Fährmann
8f518e03f8 add options to set maximum download rate
- -r/--limit-rate as cmdline option
- downloader.http.rate as config option

This implementation very roughly uses the idea of the token bucket
algorithm [1] and mostly uses Wget's approach [2] as inspiration.

[1] https://en.wikipedia.org/wiki/Token_bucket
[2] http://git.savannah.gnu.org/cgit/wget.git/tree/src/retr.c?h=v1.19.2&id=ba6b44f6745b14dce414761a8e4b35d31b176bba#n111
2017-12-02 01:47:26 +01:00
Mike Fährmann
3dc1169736 use own mapping before relying on the 'mimetypes' module 2017-12-01 13:50:31 +01:00
Mike Fährmann
79bcaa8726 improve downloader retry behavior
- only retry download on 5xx and 429 status codes
- immediately fail on 4xx status codes
2017-11-10 21:46:18 +01:00
Mike Fährmann
42e948584d fix downloader error handling
RequestException being a subclass of OSError caused all exceptions
during file downloads to be ignored/re-raised.
2017-11-07 15:23:07 +01:00
Mike Fährmann
707b15b586 create missing directories for 'part-directory'
also some code improvements regarding downloader config values
2017-10-27 12:22:45 +02:00
Mike Fährmann
caf26412dd add option to set alternate location of .part files (#29)
Note: The path set for 'downloader.*.part-directory' needs to point to an
already existing directory.
2017-10-26 00:16:48 +02:00
Mike Fährmann
9a41002b77 fix partial downloads for 'text:' URLs
Using a filesize in bytes as offset into a Python string is not
a good idea if said file contains non-ASCII characters.
2017-10-25 15:04:45 +02:00
Mike Fährmann
963670d73b add options to control usage of .part files (#29)
- '--no-part' command line option to disable them
- 'downloader.http.part' and 'downloader.text.part' config options

Disabling .part files restores the behaviour of the old downloader
implementation.
2017-10-24 23:33:44 +02:00
Mike Fährmann
b0353aa02d rewrite download modules (#29)
- use '.part' files during file-download
- implement continuation of incomplete downloads
- check if file size matches the one reported by server
2017-10-24 12:53:03 +02:00
Mike Fährmann
2e982f56af use 'Content-Length' to determine incomplete downloads (#29) 2017-10-20 18:56:18 +02:00