130 Commits

Author SHA1 Message Date
Mike Fährmann
a4ff20cf16 [downloader:http] fix issues from inaccurate 'time.sleep()'
(#3143)

Reverts part of c59b98c8 by going back to using a global timer
instead of a per-chunk one.

Reintroduces the issue of ignoring rate limits after
suspending and resuming the process.
2022-11-10 13:24:02 +01:00
Mike Fährmann
550f90ab56 delay enabling .part files when 'http-metadata' is set
otherwise 'build_path' gets called before all metadata is collected
2022-11-09 13:23:52 +01:00
Mike Fährmann
8124c16a50 split 'build_path' from 'set_filename' and 'set_extension'
Do not automatically build a new path
when setting file metadata or updating its extension.
2022-11-08 17:03:24 +01:00
Mike Fährmann
39d9c362e4 include 'http-metadata' in '-K' output 2022-11-07 16:33:26 +01:00
Mike Fährmann
870e6a48a0 implement 'http-metadata' option
or at least attempt to.
2022-11-05 18:29:29 +01:00
Mike Fährmann
bca9f965e5 [downloader:http] add 'chunk-size' option (#3143)
and double the previous default from 16384 (2**14) to 32768 (2**15)
2022-11-02 16:50:26 +01:00
Mike Fährmann
0059e2bfe7 [downloader:http] add MIME type and signature for .avif files 2022-11-01 17:25:21 +01:00
Mike Fährmann
f687e64513 [downloader:http] refactor file signature checks
use functions/lambdas instead of startswith()
2022-11-01 17:09:13 +01:00
Mike Fährmann
c0c1277c5f [downloader:http] support sending POST data (#2433)
by setting the '_http_data' metadata field for a file

needed in addition to be3492776b
to download files with POST requests
2022-03-23 21:48:38 +01:00
Mike Fährmann
be3492776b [downloader:http] support using a different method than GET (#2433)
by setting the '_http_method' metadata field for a file
2022-03-20 10:09:05 +01:00
Mike Fährmann
47cf05c4ab refactor proxy handling code (#2357)
- allow gallery-dl proxy settings to overwrite environment proxies
- allow specifying different proxies for data extraction and download
  - add 'downloader.proxy' option
  - '-o extractor.proxy=–PROXY_URL -o downloader.proxy=null'
    now has the same effect as youtube-dl's '--geo-verification-proxy'
2022-03-10 23:55:35 +01:00
Mike Fährmann
ebd3d5c1cc [bunkr] fix .mp4 downloads (closes #2239) 2022-01-28 23:21:16 +01:00
Mike Fährmann
d0761454b1 implement a download progress indicator (#1519) 2021-09-28 22:48:58 +02:00
Mike Fährmann
b5b1cf22b7 [downloader:http] reorder HTTP header sources
so that any header can be overwritten by a user, except Range
2021-08-05 23:01:54 +02:00
Mike Fährmann
221015e586 [downloader:http] disable filename extension changes for ugoira
(#1507)
2021-04-27 01:29:09 +02:00
Mike Fährmann
cf5fa75d4c add 'browser' option (#1117)
- change default user agent to Firefox ESR 78 on Windows 10
- remove 'ciphers' option
2021-02-26 13:41:27 +01:00
Mike Fährmann
560277394e [downloader:http] add 'headers' option (#1322) 2021-02-21 19:13:39 +01:00
Mike Fährmann
a228bb3a5f [downloader:http] support callbacks to validate responses 2021-01-29 22:15:21 +01:00
Mike Fährmann
0594821fcd [downloader:http] add MIME type and signature for .ico files
(closes #1211)
2021-01-01 16:07:33 +01:00
Mike Fährmann
476d563ec2 [downloader:http] add MIME type and signature for .swf files 2020-12-11 14:21:04 +01:00
Mike Fährmann
fe0265c7a5 [downloader.http] small improvements to file signature list
- specify multiple entries for gif, mp3, zip
- add entries for pdf
2020-12-08 21:20:18 +01:00
Mike Fährmann
1a4b61f7eb [downloader:http] fix issues with chunked transfer encoding
(fixes #1144)
2020-11-30 01:10:45 +01:00
Mike Fährmann
536c088462 [downloader:http] improve 'adjust-extensions' (#776)
Check file headers against a list of file signatures before
downloading the whole file and writing it to disk.

The file signature check needs some improvements (*),
but it produces usable results for the most part.

(*)
- 'webp', 'wav', and others start with 'RFFI'
- 'svg' uses the same "signature" as all XML documents
- 'webm' has the same signature as 'mkv' files
- only 'mp3' files in an ID3v2 container get recognized
2020-11-29 20:55:35 +01:00
Mike Fährmann
f6fd449b59 reduce wait time growth rate from exponential to linear
Waiting for 2**N seconds after each error grows too fast.
Simply waiting N seconds seems far more reasonable.
2020-09-06 22:38:25 +02:00
Mike Fährmann
ac3036ef56 add 'filesize-min' and 'filesize-max' options (closes #780) 2020-09-03 18:21:04 +02:00
Mike Fährmann
34929f673f readd 'session' to base downloader class (fixes #768) 2020-05-20 20:04:46 +02:00
Mike Fährmann
ece73b5b2a make 'path' and 'keywords' available in logging messages
Wrap all loggers used by job, extractor, downloader, and postprocessor
objects into a (custom) LoggerAdapter that provides access to the
underlying job, extractor, pathfmt, and kwdict objects and their
properties.

__init__() signatures for all downloader and postprocessor classes have
been changed to take the current Job object as their first argument,
instead of the current extractor or pathfmt.

(#574, #575)
2020-05-18 19:04:51 +02:00
Mike Fährmann
19a7afdd9b [downloader:http] add MIME types for .psd files (closes #714) 2020-04-29 23:01:42 +02:00
Mike Fährmann
38bc6430d3 [downloader:http] don't overwrite existing '_mtime' fields 2020-04-10 23:08:03 +02:00
Mike Fährmann
115fd2c6f2 "fix" incomplete MIME types (#632)
e-/exhentai's original image downloads currently send
incomplete/invalid Content-Type headers, "jpg" instead
of "image/jpg" etc, since the last update.
(https://forums.e-hentai.org/index.php?showtopic=236113)

This change prepends any Content-Type value missing a
media type specification with "image/", transforming it
into a valid MIME type.

(A global solution to a local problem, but it shouldn't
 cause any issues anywhere else)
2020-03-03 21:21:57 +01:00
Mike Fährmann
adcd7cb24a [downloader:http] add another MIME type for '.rar' files (#628) 2020-03-01 20:42:13 +01:00
Mike Fährmann
380b693fad [downloader:http] add more MIME types for '.bmp' files (#621) 2020-02-23 16:51:04 +01:00
Mike Fährmann
760b9b4db4 add remove_file() and remove_directory() helpers
these functions call os.unlink() or os.rmdir()
while catching and suppressing potential OSErrors
2020-01-18 00:21:26 +01:00
Mike Fährmann
c4702ec9b6 simplify some logging calls 2019-12-10 21:30:08 +01:00
Mike Fährmann
c59b98c81b [downloader:http] improve rate limit handling
- Move the download "logic" with rate limit checks into its own
  method that only gets used if a rate limit should be enforced
- Fix an issue where suspending gallery-dl during a download would
  basically ignore the rate limit for the remaining download when
  resuming its execution.
2019-12-09 20:34:22 +01:00
Mike Fährmann
bbbafc1c24 [downloader:http] catch both possible SSLException instances
With pyOpenSSL installed, but disabled, the SSLError exception
would be set to the one from pyOpenSSL, which could never get raised.

This commit solves this problem by catching both, the native SSLError
exception as well as the one from pyOpenSSL (if available.1)
2019-12-09 20:34:10 +01:00
Mike Fährmann
bbbeff4c41 [downloader.http] implement file-specific HTTP headers 2019-11-19 23:50:54 +01:00
Mike Fährmann
d44f790e81 adjust output for HTTP status related errors 2019-10-27 23:55:02 +01:00
Mike Fährmann
1032cfa34b [downloader:http] extend mimetype map with archive formats 2019-10-10 18:30:23 +02:00
Mike Fährmann
8eaae58045 [downloader:http] change log message level to 'debug' 2019-08-29 23:05:47 +02:00
Mike Fährmann
ebabc5caf1 [downloader:http] treat 416 without downloaded data as error
Downloading https://pbs.twimg.com/media/EB2cGUYX4AI2Vuu.jpg:orig (NSFW)
sometimes returns a 416 status code, even though no 'Range' header was
sent and no data was downloaded prior.
This code usually means a file has already been downloaded completely
and the download method indicates success, but in this case it causes
an exception down the pipeline since no file was created.
2019-08-20 00:15:17 +02:00
Mike Fährmann
0bb873757a update PathFormat class
- change 'has_extension' from a simple flag/bool to a field that
  contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306)
2019-08-12 21:40:37 +02:00
Mike Fährmann
b7fb93e2b2 [downloader:http] add 'adjust-extensions' option 2019-08-08 16:54:20 +02:00
Mike Fährmann
16c582aaf9 implement 'mtime' post-processor (#332)
This can set a file's modification time according to a UNIX timestamp
or a datetime object from its metadata.
2019-07-14 22:39:17 +02:00
Mike Fährmann
8966930c5c [downloader:http] try to import SSL exception class from OpenSSL
(#324)
2019-07-01 20:10:26 +02:00
Mike Fährmann
69205df68d allow '-1' for infinite retries (#300) 2019-06-30 23:10:47 +02:00
Mike Fährmann
f7b5c4c3e7 use values of 'retries' options correctly
The RE-tries option now specifies exactly that: the maximum number a
failed HTTP request is re-tried. For example a value of 2 will now
correctly stop after 3 attempts: the initial one + 2 re-tries.

The maximum wait-time now also caps at 30min and increases exponentially
for both extractor.request() and downloader.http.download().
2019-06-30 23:10:18 +02:00
Mike Fährmann
db3f52881a add 'mtime' option 2019-06-20 17:19:44 +02:00
Mike Fährmann
f4ba98771d use Last-Modified header to set file modification time
(#236, #277)
2019-06-19 23:16:32 +02:00
Mike Fährmann
179d112083 [downloader] overhaul http and text modules
Get rid of the modular structure and simplify/specialize those modules.
2019-06-19 22:56:11 +02:00