Mike Fährmann
170711af7e
[mangadex] fix extraction ( closes #2177 )
2022-01-08 17:21:35 +01:00
Mike Fährmann
199e7616a7
[rule34] use https://api.rule34.xxx for API requests
2022-01-08 17:14:50 +01:00
Mike Fährmann
3c79c9b271
document extended blacklist/whitelist syntax ( #2025 )
...
and not just in the commit message of 010d65dc
2022-01-06 23:36:57 +01:00
Mike Fährmann
6e0a6c484f
apply SPECIAL_EXTRACTORS only for blacklist settings
...
as was the case before 010d65dc
2022-01-06 21:09:30 +01:00
Mike Fährmann
37beb1298e
[newgrounds] add 'search' extractor ( closes #2161 )
2022-01-06 19:32:39 +01:00
Mike Fährmann
8b910dd8ae
[hitomi] fix image URLs
...
again and again ...
2022-01-06 18:21:26 +01:00
Mike Fährmann
dcfe08838d
restore -d/--dest functionality
...
change short option for --directory from -d to -D
2022-01-03 18:30:36 +01:00
Mike Fährmann
3085aac4d8
[gelbooru] handle changed API response format ( #2157 )
2022-01-03 16:42:48 +01:00
Mike Fährmann
38e2af29d6
[hitomi] fix image URLs
...
update '_parse_gg()' yet again
2022-01-03 16:41:00 +01:00
Mike Fährmann
6f2e0c9c3d
fix cookie checks for patreon, fanbox, fantia
...
The changes in 9a255344 caused a warning about missing cookies to be
displayed even if those cookies were present, because _check_cookies()
did not account for an empty cookiedomain.
2022-01-01 17:55:58 +01:00
Mike Fährmann
1e0278702d
[hitomi] update '_parse_gg()'
2022-01-01 17:55:58 +01:00
Mike Fährmann
3b7c7daa76
improve UNC path handling ( #2126 )
...
always call 'abspath()' on the directory path to handle cases when the
current working directory is UNC and 'base-directory' is relative.
2021-12-30 22:22:19 +01:00
Mike Fährmann
47eae4c393
release version 1.20.0
2021-12-29 22:59:14 +01:00
Mike Fährmann
becc7f85a6
[hitomi] fix image URLs
2021-12-29 22:46:17 +01:00
Mike Fährmann
6af8d71da6
[kemonoparty] use service as subcategory ( closes #2147 )
2021-12-29 22:46:17 +01:00
Mike Fährmann
fa7d92f7a9
add docs for 'extractor.generic.enabled'
2021-12-29 22:46:17 +01:00
Vrihub
96fcff182c
generic extractor ( #735 )
...
* Generic extractor, see issue #683
* Fix failed test_names test, no subcategory needed
* Prefix directory_fmt with "generic"
* Relax regex (would break some urls)
* Flake8 compliance
* pattern: don't require a scheme
This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).
* Fix using g: and r: on urls without http(s) scheme
Almost all extractors accept urls without an initial http(s) scheme.
Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).
This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.
This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.
* Relax imageurl_pattern_ext: allow relative urls
* First round of small suggested changes
* Support image urls starting with "//"
* self.baseurl: remove trailing slash
* Relax regexp (didn't catch some image urls)
* Some fixes and cleanup
* Fix domain pattern; option to enable extractor
Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
2021-12-29 22:39:29 +01:00
Mike Fährmann
4376b39a2b
[sexcom] fix and improve embed extraction ( fixes #2145 )
2021-12-28 21:59:39 +01:00
Mike Fährmann
b5b4f5a168
use 'build_extractor_filter' in test_results.py
2021-12-28 17:25:07 +01:00
Mike Fährmann
6d190834ee
[instagram] fix error when PostPage data is not in GraphQL format
...
(#2037 )
2021-12-28 00:27:59 +01:00
Mike Fährmann
4edf43891c
add -d/--directory and -f/--filename command-line arguments
2021-12-27 23:31:54 +01:00
Mike Fährmann
dd67e24aa9
[lolisafe] include file ID in filenames
...
More precisely, it now splits the full 'filename' into 'name' and 'id'
instead of overwriting 'filename'. The format string stays the same as
before. Use '{name}.{extension}' to restore the old behavior.
before:
- filename: foobar
- id : 12345
now:
- filename: foobar-12345
- name : foobar
- id : 12345
2021-12-25 17:16:45 +01:00
Mike Fährmann
f3d61de18d
[artstation] create directories per asset ( closes #2136 )
2021-12-25 17:16:45 +01:00
Mike Fährmann
49a50fb2eb
[500px] create directories per photo
2021-12-25 17:16:45 +01:00
Mike Fährmann
89bebe1bef
[500px] add 'favorite' extractor ( closes #1927 )
2021-12-25 17:16:45 +01:00
Mike Fährmann
22b0433985
[fanbox] support pixiv redirects ( closes #2122 )
2021-12-25 17:15:39 +01:00
Mike Fährmann
281828b58b
[tumblrgallery] improve search pagination ( fixes #2132 )
2021-12-24 03:42:28 +01:00
Mike Fährmann
9b67e63a89
[ytdl] update to latest yt-dlp changes ( fixes #2124 )
2021-12-24 01:50:47 +01:00
Mike Fährmann
4bec34fc94
[pixiv] allow setting a date range for search results ( #2133 )
...
with the 'scd' and 'ecd' query parameters
2021-12-23 23:03:39 +01:00
Mike Fährmann
882c614281
add album extractor for lolisafe/chibisafe instances
...
- support bunkr.is (closes #2038 )
- support zz.ht (closes #2105 )
2021-12-21 19:24:17 +01:00
Mike Fährmann
7bf1d3fd32
rename --write-infojson to --write-info-json
...
to be consistent with the name used in youtube-dl/yt-dlp
(the old --write-infojson still works)
2021-12-21 00:21:39 +01:00
Mike Fährmann
d441888bfb
[deviantart] adjust API endpoints
...
Start all endpoints with a forward slash '/'
to be consistent with other API interfaces.
2021-12-21 00:18:06 +01:00
Mike Fährmann
8f0cf0bf71
[deviantart] use '/browse/newest' for most-recent searches
...
(#2096 )
2021-12-20 22:40:03 +01:00
Mike Fährmann
0bd7607da5
[tumblrgallery] improve 'id' extraction ( #2115 )
2021-12-19 05:46:02 +01:00
Mike Fährmann
ac80474371
handle UNC paths ( #2113 )
2021-12-19 04:52:00 +01:00
Mike Fährmann
47df50a2ad
add --sleep-request and --sleep-extractor command-line options
2021-12-19 03:18:50 +01:00
Mike Fährmann
64cf26eaf4
allow specifying sleep-* options as string
...
either as single value or as range: "3.5", "2.1 - 5.0"
2021-12-18 23:28:56 +01:00
Mike Fährmann
0d02a7861e
[tumblrgallery] fix extraction ( closes #2112 )
2021-12-17 19:55:53 +01:00
Mike Fährmann
62692c6842
[exhentai] add 'source' option
...
setting it to "hitomi" downloads the corresponding gallery from
hitomi.la; might be extended to other sources in the future
2021-12-16 23:16:19 +01:00
Mike Fährmann
099ed72de7
[hitomi] disable extra 'metadata' by default
...
safes one HTTP request that not needed with default filename settings
2021-12-16 22:21:07 +01:00
Mike Fährmann
9a25534490
use Extractor._check_cookies() for all cookie checks
2021-12-16 02:21:16 +01:00
Mike Fährmann
63c6bc26b5
[rule34us] extract tags per category ( #1527 )
...
like for other boorus with 'tags': true
2021-12-16 00:06:52 +01:00
Mike Fährmann
f587458a3c
[twitter] include '4096x4096' as a default image fallback
...
(closes #2107 , closes #1881 )
2021-12-15 23:19:30 +01:00
Mike Fährmann
8ed282f7f2
[kemonoparty] support coomer.party URLs ( #2100 )
2021-12-15 16:21:05 +01:00
Mike Fährmann
87ce3fa669
[furaffinity] warn when no session cookies were found
2021-12-15 16:21:05 +01:00
Mike Fährmann
159631c808
[philomena] use a default 'filter_id' if non is given
2021-12-15 16:20:53 +01:00
Mike Fährmann
ad30653b17
allow running a BaseExtractor for any URL
...
by prefixing it with '<base-category>:'
For example:
shopify:https://partakefoods.com/products/crunchy-cookie-variety-pack
gelbooru_v01:https://5naf.booru.org/index.php?page=post&s=view&id=46963
Available base categories are:
mastodon, shopify, moebooru, gelbooru_v01, gelbooru_v02,
reactor, foolslide, foolfuuka, philomena
2021-12-15 00:32:17 +01:00
Mike Fährmann
299bd2f1f5
[rule34us] add 'tag' and 'post' extractors ( #1527 )
2021-12-14 00:27:46 +01:00
Mike Fährmann
3cf1075d86
[inkbunny] add 'search' extractor ( closes #2094 )
2021-12-12 03:08:14 +01:00
Mike Fährmann
c6a23c26d7
[instagram] allow downloading specific stories ( closes #2088 )
...
https://instagram.com/stories/ <USER>/<ID> now only downloads the one
story specified by <ID> and not all stories from that user.
2021-12-11 21:34:25 +01:00