Commit Graph

306 Commits

Author SHA1 Message Date
Luc Ritchie
7dd79eee93 save cookies to tempfile, then rename
avoids wiping the cookies file if the disk is full
2023-12-11 00:47:42 -05:00
Mike Fährmann
6a4218aa23 handle 'json' parameter in Extractor.request() manually
Mainly to allow passing custom classes like util.LazyPrompt,
but also to simplify and streamline how requests handles it.
2023-12-06 22:13:13 +01:00
Mike Fährmann
9dd5cb8c8a interactively prompt for passwords on login when none is provided 2023-12-06 22:12:59 +01:00
Mike Fährmann
34a387b6e2 support 'metadata-*' names for '*-metadata' options
For example, instead of 'url-metadata' it is now also possible to use
'metadata-url' as option name.

- metadata-url
- metadata-path
- metadata-http
- metadata-version
- metadata-parent
2023-11-18 23:52:10 +01:00
Mike Fährmann
61d6558322 [exhentai] try to avoid 'DH_KEY_TOO_SMALL' errors (#1021, #4593) 2023-11-04 17:30:27 +01:00
Mike Fährmann
eb230e4b77 [nsfwalbum] disable Referer headers by default (#4598) 2023-10-01 13:55:17 +02:00
Mike Fährmann
3ecb512722 send Referer headers by default 2023-09-19 00:02:04 +02:00
Mike Fährmann
4cdab8074e update/fix --list-extractors 2023-09-11 17:32:59 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
ceb59e176f fix default Firefox user agent string
note to self: do not trust some random third-party website
2023-09-02 22:22:23 +02:00
Mike Fährmann
a4f7f7da17 add '_dump()' convenience method to Extractor 2023-08-06 17:03:09 +02:00
Mike Fährmann
48ef062867 fix issues with 'Extractor.finalize()'
- prevent crash in InstagramUserExtractor (#4359)
- call it at the end of every DownloadJob
- add it to tests
2023-07-29 13:43:27 +02:00
Mike Fährmann
ed21908fda initial support for child extractor options
Using "parent-category>child-category" as extractor category in a config
file allows to set options for a child extractor when it was spawned by
that parent.

For example "reddit>gfycat" to set gfycat options for when it was found
in a reddit post.

{
    "extractor": {
        "gfycat": {
            "filename": "regular filename"
        },
        "reddit>gfycat": {
            "filename": "reddit-specific filename"
        }
    }
}

Note: This does currently not work for most imgur links due to how its
extractor hierarchy is structured.
2023-07-28 17:07:25 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
ceebacc9e1 remove 'pyopenssl' option 2023-07-19 20:44:07 +02:00
Mike Fährmann
5b59a0d143 update default User-Agent header to Firefox 115 ESR 2023-07-05 15:12:50 +02:00
Mike Fährmann
856f6c10cd allow for GalleryExtractors to skip loading gallery_url 2023-05-22 22:29:30 +02:00
Mike Fährmann
3ca5dac8b6 extend 'cookies-update' functionality
Allow writing cookies to a different file than a given cookies.txt,
making it possible to export cookies imported with --cookies-from-browser

To convert browser cookies to cookies.txt format:
  gallery-dl --cookies-fr chromium \
             -o cookies-update=cookies.txt \
             --no-download \
             http://example.org/file.jpg
2023-05-04 15:10:47 +02:00
Mike Fährmann
bc6d65d203 implement 'Extractor.config_deprecated()'
a version of 'Extractor.config()'
that logs a warning when using a deprecated option name
2023-05-04 10:49:14 +02:00
Mike Fährmann
076380e079 remove '*' indicating keyword-only arguments
they are kind of unnecessary and
cause a non-insignificant function call overhead (~10%)
2023-05-02 22:23:33 +02:00
Mike Fährmann
9abcb2b6e5 update headers and ciphers for '"browser": "chrome"' 2023-03-08 17:19:59 +01:00
Mike Fährmann
00b94946b3 [instagram] show -o cursor=… after every error (#3440) 2023-01-23 13:00:44 +01:00
Mike Fährmann
80a2ff2d38 support setting 'write-pages' to "ALL"
to show authentication header, cookies, etc
2023-01-14 22:34:46 +01:00
Mike Fährmann
c881548a27 add 'extractor.retry-codes' option (#3313)
do not retry 429 and 430 by default
2023-01-14 17:25:30 +01:00
Mike Fährmann
9695c4e88d emit debug logging message when loading cookies from file
attempt nr. 2
no idea how I managed to remove 6514828d in a918ce29
2023-01-06 11:13:44 +01:00
Mike Fährmann
a918ce29b5 run tests on ubuntu-20.04
and remove Python 3.4, since that's no longer available
on this test runner
2023-01-05 13:33:27 +01:00
Mike Fährmann
6514828d4e emit debug logging message when loading cookies from file 2023-01-05 12:40:22 +01:00
Mike Fährmann
9f06e79868 implement '"user-agent": "browser"' (#2636) 2022-11-13 19:17:39 +01:00
Mike Fährmann
86790da2d5 update Cloudflare IUAM detection
again
2022-10-31 18:33:52 +01:00
Mike Fährmann
8b1fe0bcf1 emit debug logging messages before calling time.sleep() (#2982) 2022-10-08 15:41:39 +02:00
Mike Fährmann
73a52a95b0 update Cloudflare IUAM detection 2022-09-12 11:40:06 +02:00
Mike Fährmann
eb68d45544 add global 'warnings' option (#2762) 2022-07-18 22:20:30 +02:00
Mike Fährmann
e4f48cc810 make it easier to disable default 'browser' settings
Previously it was necessary to set 'browser' to a non-empty, non-string
value to disable any default 'browser' value.
Now '-o browser=' or '-o browser=false' is enough.
2022-07-07 11:17:43 +02:00
Mike Fährmann
92b75bcdce limit path length for --write-pages output on Windows (#2733) 2022-07-06 18:56:23 +02:00
Mike Fährmann
de20cadc68 add 'brotli' as optional dependency (#2716)
only send 'Accept-Encoding: br' if supported
2022-06-29 15:10:05 +02:00
Mike Fährmann
3a5d5c3a91 update default User-Agent header to Firefox 102 ESR
snd update headers and ciphers for "browser": "firefox"
2022-06-28 17:38:58 +02:00
Mike Fährmann
535cbcb185 cache extracted browser cookies
(in memory, for as long as gallery-dl is running)

Extracting encrypted cookies from a chromium-based browser can take a
long time, so repeating this process for each extractor should be
avoided.

Same goes for creating a temporary copy of the entire cookie database.
2022-06-04 12:38:38 +02:00
Mike Fährmann
6742f3bc1e implement --cookies-from-browser (#1606)
most of the code is adapted from yt-dlp's implementation
and *should* work the same.
2022-05-07 23:06:37 +02:00
Mike Fährmann
c4b9f7bab8 update functions working with cookies.txt files
- rename
  - load_cookiestxt -> cookiestxt_load
  - save_cookiestxt -< cookiestxt_store
- in cookiestxt_load, add cookies directly to a cookie jar
  instead of storing them in a list first
- other unnoticeable performance increases
2022-05-06 13:21:29 +02:00
Mike Fährmann
3f02e483c6 [e621] fix applying request_interval_min (#2533)
Setting this property after calling Extractor.__init__() has no effect.
2022-04-27 21:10:34 +02:00
Mike Fährmann
29db716a63 implement 'datetime_to_timestamp()'
and rename 'to_timestamp()'
to the more descriptive 'datetime_to_timestamp_string()'
2022-03-23 22:36:01 +01:00
Mike Fährmann
500a479026 fix a third(!) bug in _check_cookies() (#2372)
turns out tests are worthless if you get em wrong ...
2022-03-18 19:52:37 +01:00
Mike Fährmann
47cf05c4ab refactor proxy handling code (#2357)
- allow gallery-dl proxy settings to overwrite environment proxies
- allow specifying different proxies for data extraction and download
  - add 'downloader.proxy' option
  - '-o extractor.proxy=–PROXY_URL -o downloader.proxy=null'
    now has the same effect as youtube-dl's '--geo-verification-proxy'
2022-03-10 23:55:35 +01:00
Mike Fährmann
bddcec49f1 implement 'text.root_from_url()'
use domain from input URL for kemono
2022-03-01 03:09:57 +01:00
Mike Fährmann
f5b2b9333f fix another bug in _check:cookies (#2160)
regression introduced in ed317bfc

Added a couple of tests to hopefully catch such bugs
before they land in a release.
2022-02-16 22:58:57 +01:00
Mike Fährmann
ed317bfcf1 warn about cookies expiring in less than 24 hours
requires an expiration timestamp,
so this only works with cookies from a cookies.txt file
2022-02-13 23:00:49 +01:00
Mike Fährmann
b4f8e15a1f allow BaseExtractors to use the domain pf the matched URL 2022-02-10 01:38:50 +01:00
Mike Fährmann
f58364f6a8 update Firefox cipher list 2022-02-01 02:33:01 +01:00
Mike Fährmann
7e6981dda6 rename 'disabletls12' to 'tls12'
and let config options override any default settings
2022-02-01 01:37:03 +01:00