Commit Graph

245 Commits

Author SHA1 Message Date
Mike Fährmann
4853406fe3 [common] allow MangaExtractors to skip loading manga_url 2025-01-10 21:30:58 +01:00
Mike Fährmann
041baf8441 [common] compute and use latest Firefox UA
instead of the latest ESR UA
2024-12-17 22:20:37 +01:00
Mike Fährmann
0802e42c90 [common] use random unused port for '"user-agent": "browser"' 2024-12-17 21:40:20 +01:00
Mike Fährmann
e8826ed3d4 [common] simplify HTTP error messages
[warning] HTTPSConnectionPool(host='domain.tld', port=443): Max retries
exceeded with url: /a.jpg (Caused by NameResolutionError("<urllib3.
connection.HTTPSConnection object at 0x7247fe436ea0>: Failed to resolve
'domain.tld' ([Errno -2] Name or service not known)")) (1/5)

->

[warning] NameResolutionError: Failed to resolve 'domain.tld'
([Errno -2] Name or service not known) (1/5)
2024-12-10 17:13:44 +01:00
Mike Fährmann
86f3f3f763 [common] detect DDoS-Guard challenge pages 2024-12-08 21:39:04 +01:00
Mike Fährmann
7091904b20 [common] restore using environment proxies by default (#6553, #6609)
change 'proxy-env' default to 'true'
2024-12-07 17:38:44 +01:00
Mike Fährmann
57f8227473 [common] improve handling of 'user-agent' settings (#6594)
improves 5412b22dae

ignore 'extractor.user-agent' only for extractors using a custom
'User-Agent' header
2024-12-03 10:55:41 +01:00
Mike Fährmann
5412b22dae [common] allow overriding more default 'User-Agent' headers (#6496)
ignore 'extractor.user-agent' if it is the default useragent value
and an extractor wants to set its own custom value
2024-11-26 21:50:28 +01:00
Mike Fährmann
c82f3db098 [common] add 'proxy-env' option
(#6134, #6455)
disable using environment proxies by default
2024-11-15 18:03:56 +01:00
Mike Fährmann
0a72a5009c [common] disable Authorization header injection from .netrc auth
(#6134, #6455)
2024-11-15 17:37:04 +01:00
Mike Fährmann
390b8ddd3e [common] emit logging messages for --write-pages files 2024-11-03 20:38:33 +01:00
Mike Fährmann
ee61256054 [output] define and use global TTY_STD... values 2024-10-28 14:59:14 +01:00
Mike Fährmann
3946fe5ac4 [cookies] return loaded cookies as list
don't set_cookie() them immediately into a CookieJar
also, give some more consistent names to chrome/chromium functions
2024-10-14 14:24:27 +02:00
Mike Fährmann
6d8d882dbf [common] allow request() to accept all HTTP status codes
by passing Ellipsis/... as 'fatal' argument
2024-10-11 19:49:16 +02:00
Mike Fährmann
f8f67dab22 [cookies] add 'cookies-select' option 2024-09-27 10:41:26 +02:00
Mike Fährmann
0db3c11ab0 [common] use 'cf-mitigated' header to detect challenges 2024-09-07 20:16:06 +02:00
Mike Fährmann
6110e3f940 [common] fix Logger names of BaseCategory extractors
update of d11ec009
fixes regressions introduced in 0c178846
2024-07-12 22:51:46 +02:00
Mike Fährmann
eb3ef13d28 include 'zstd' in Accept-Encoding header when supported
… and slightly update optional dependency list
2024-07-10 00:33:35 +02:00
Mike Fährmann
8aca0e6970 update default User-Agent header to Firefox 128 ESR 2024-07-09 20:42:06 +02:00
Mike Fährmann
11421cf940 [skeb] fix '429 Too Many Requests' errors (#5766)
Introduce '_handle_429' method to make it easier for Extractors to react
to 429 errors regardless of 'sleep-429' settings.
2024-06-21 00:12:05 +02:00
Mike Fährmann
60b4541199 improve a1bb3279, fix oauth:pixiv (#5757)
Check 'input' option only when required.

This also fixes an exception in oauth:pixiv caused by using the same
'_input' name  as a method defined there.
2024-06-18 16:50:04 +02:00
Mike Fährmann
a1bb32792b do not try to read from stdin when it is non-interactive (#5733)
add '--no-input' command-line option and 'input' config file option
to allow users to manually configure this
2024-06-16 18:31:39 +02:00
Mike Fährmann
5d3d03a1f1 fix 6cfbc107
the former condition would return True for 2.31.*
6cfbc1071f (commitcomment-142642913)
2024-06-02 18:16:53 +02:00
Mike Fährmann
6cfbc1071f workaround for requests 2.32.3 (#5665)
manually call 'load_default_certs()' for SSLContexts
in custom HTTPAdapter instances
2024-06-01 16:02:18 +02:00
Mike Fährmann
28039229fe [common] use 'create_urllib3_context' for creating SSLContexts
enables dumping TLS session keys by setting SSLKEYLOGFILE (#5215)
as well as other potentially useful settings.
2024-05-10 22:59:29 +02:00
Mike Fährmann
33006fe126 [common] disable 'check_hostname' for non-urllib3 SSLContexts
e.g. when 'browser' is set to a non-empty value and gallery-dl creates
its own SSLContext instance instead of using requests' and urllib3's
defaults.

urllib3 disables this option for its default contexts,
since it does this check on its own.

Fixes "ValueError: Cannot set verify_mode to CERT_NONE when
check_hostname is enabled" when using --no-check-certificate.

(#3614, #4891, #5576)
2024-05-10 18:20:08 +02:00
Mike Fährmann
d11ec00908 [common] fix _cfgpath for BaseExtractor objects
After the changes in 0c17884673,
_cfgpath was mssing its 'category' value
since that hadn't been initialized yet.
2024-05-01 16:00:07 +02:00
Mike Fährmann
a7d8cbab0e [common] show full URL in Extractor.request() error messages 2024-04-18 15:45:36 +02:00
Mike Fährmann
a5071c9ca0 [common] fix NameError in Extractor.request()
… when accessing 'code' after an requests exception was raised.

Caused by the changes in 566472f080
2024-04-18 15:42:53 +02:00
Mike Fährmann
566472f080 [common] add 'sleep-429' option (#5160) 2024-04-16 18:41:28 +02:00
Mike Fährmann
923c6f3214 [common] simplify 'status_code' check in Extractor.request() 2024-04-16 18:39:47 +02:00
Mike Fährmann
68f4208251 [common] update Extractor.wait() message format 2024-04-16 17:51:14 +02:00
Mike Fährmann
b38a917355 [common] add Extractor.input() method 2024-04-16 00:02:48 +02:00
Mike Fährmann
0d72789aa3 merge #5461: [cookies] use tempfile when saving cookies.txt files 2024-04-13 19:02:39 +02:00
Mike Fährmann
63ac06643f compute tempfile path only once 2024-04-13 18:59:18 +02:00
Mike Fährmann
0c17884673 store 'match' and 'groups' in Extractor objects 2024-04-01 03:07:52 +02:00
Mike Fährmann
106dfdb4c3 cleanup sleep-request retry delay code
more lines but easier to read I'd say
2024-03-11 21:38:06 +01:00
Mike Fährmann
89066844f4 add 'config_instance' method
to allow for a more streamlined access to BaseExtractor instance options
2024-01-18 03:20:36 +01:00
Mike Fährmann
f36dafad06 improve 'include' handling (#4982)
- remove spaces when given as string
- warn about invalid vales
2023-12-28 19:07:04 +01:00
Luc Ritchie
7dd79eee93 save cookies to tempfile, then rename
avoids wiping the cookies file if the disk is full
2023-12-11 00:47:42 -05:00
Mike Fährmann
6a4218aa23 handle 'json' parameter in Extractor.request() manually
Mainly to allow passing custom classes like util.LazyPrompt,
but also to simplify and streamline how requests handles it.
2023-12-06 22:13:13 +01:00
Mike Fährmann
9dd5cb8c8a interactively prompt for passwords on login when none is provided 2023-12-06 22:12:59 +01:00
Mike Fährmann
34a387b6e2 support 'metadata-*' names for '*-metadata' options
For example, instead of 'url-metadata' it is now also possible to use
'metadata-url' as option name.

- metadata-url
- metadata-path
- metadata-http
- metadata-version
- metadata-parent
2023-11-18 23:52:10 +01:00
Mike Fährmann
61d6558322 [exhentai] try to avoid 'DH_KEY_TOO_SMALL' errors (#1021, #4593) 2023-11-04 17:30:27 +01:00
Mike Fährmann
eb230e4b77 [nsfwalbum] disable Referer headers by default (#4598) 2023-10-01 13:55:17 +02:00
Mike Fährmann
3ecb512722 send Referer headers by default 2023-09-19 00:02:04 +02:00
Mike Fährmann
4cdab8074e update/fix --list-extractors 2023-09-11 17:32:59 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
ceb59e176f fix default Firefox user agent string
note to self: do not trust some random third-party website
2023-09-02 22:22:23 +02:00
Mike Fährmann
a4f7f7da17 add '_dump()' convenience method to Extractor 2023-08-06 17:03:09 +02:00