Commit Graph

3196 Commits

Author SHA1 Message Date
Mike Fährmann
dfe1f490e9 [mangadex] use custom User-Agent header (#1535) 2021-07-15 16:39:32 +02:00
Mike Fährmann
36a2aff363 [vk] improve metadata extraction and URL pattern (fixes #1691)
- always fetch all user metadata
- use 'user[name]' for directory names if available
2021-07-15 00:43:42 +02:00
Mike Fährmann
b9783403d9 add 'url-metadata' option (#1659, #1073) 2021-07-14 03:08:49 +02:00
Mike Fährmann
e622e004f0 [ytdl] improve module imports (#1680)
Apply 'extractor.ytdl.module' for every URL, not just the first.
2021-07-14 03:08:00 +02:00
Mike Fährmann
e95f99882f extend 'parent-metadata' functionality (#1687, #1651, #1364) 2021-07-14 02:53:41 +02:00
Mike Fährmann
193401ce3b [ytdl] "fix" cookie transfer between session and ytdl (#1680)
requests' CookieJar class is not quite compatible with the standard
http.cookiejar.CookieJar used by youtube_dl
2021-07-12 18:50:25 +02:00
Mike Fährmann
9a849cdf61 [ytdl] allow setting 'module' for subcategories (#1680) 2021-07-12 18:47:12 +02:00
Mike Fährmann
dff0da60f9 [ytdl] add 'generic' option (#1680) 2021-07-11 23:48:18 +02:00
Mike Fährmann
d3da96142a [ytdl] support cookies + username&password (#1680) 2021-07-11 22:51:57 +02:00
Mike Fährmann
36ac2197db [ytdl] add extractor for sites supported by youtube-dl
(#1680, #878)

Can be used by prefixing any URL with 'ytdl:',
or by setting 'extractor,ytdl.enabled' to 'true'.
2021-07-10 20:55:47 +02:00
Mike Fährmann
64240c8d42 [imagevenue] fix extraction
(closes #1677)
2021-07-09 20:13:18 +02:00
Mike Fährmann
d287d2eb88 [kemonoparty] parse 'o' query parameters (#1674) 2021-07-09 18:29:50 +02:00
Mike Fährmann
8b036778e3 [kemonoparty] add 'max-posts' option (#1674) 2021-07-09 18:19:02 +02:00
Mike Fährmann
5612ca31c2 [hitomi] fix image URLs (closes #1679) 2021-07-09 18:01:49 +02:00
Mike Fährmann
8ecca3af58 [pixiv] add extractor for 'pixivision' articles (#1672) 2021-07-07 16:27:16 +02:00
Mike Fährmann
312a28e78a [mastodon] add 'replies' option (#1669) 2021-07-07 00:59:02 +02:00
Mike Fährmann
513c491cea [mastodon] reset 'params' after first pagination iteration
otherwise query parameters in 'params' get specified twice the second
time around - once from the 'links["next"]' URL and once from 'params'
itself.
2021-07-07 00:07:18 +02:00
Mike Fährmann
a1f5b78039 [mastodon] add 'reblogs' option (#1669) 2021-07-06 23:27:32 +02:00
Mike Fährmann
317ecc8180 use HTML tables in docs/formatting.md 2021-07-05 23:29:03 +02:00
Mike Fährmann
5f1b13d1a5 release version 1.18.1 2021-07-04 22:37:19 +02:00
Mike Fährmann
21c2da454f update extractor test results 2021-07-04 22:00:32 +02:00
Mike Fährmann
7f591c78cb [mangafox] cleanup 2021-07-04 03:21:02 +02:00
FollieHiyuki
4763bc1e4e Add MangaExtractor for mangafox (#1633) 2021-07-03 22:53:21 +02:00
Mike Fährmann
b519bf567c [hiperdex] use domain from input URL 2021-07-02 23:23:42 +02:00
Mike Fährmann
93d356712c [mastodon] implement 'text-posts' option (#1569)
similar to Twitter's 'text-tweets'
2021-07-02 22:12:41 +02:00
Mike Fährmann
414bdc95a3 [twitter] set 'retweet_id' for original retweets (#1481) 2021-07-02 21:50:37 +02:00
Mike Fährmann
5323c1c73a [twitter] ensure guest tokens are returned as string (#1665) 2021-07-01 14:35:53 +02:00
Mike Fährmann
963d177a68 document format string syntax
or at least attempt to
2021-06-29 19:35:07 +02:00
Mike Fährmann
9ee45f3617 [kemonoparty] warn about missing DDoS-GUARD cookies 2021-06-28 23:34:58 +02:00
Mike Fährmann
344aab3fb7 [seisoparty] warn about missing DDoS-GUARD cookies 2021-06-28 23:33:21 +02:00
Mike Fährmann
035562bd11 [twitter] remove old-style URLs from image fallback lists 2021-06-28 16:25:24 +02:00
Mike Fährmann
daf821b8b6 [seisoparty] use user names instead of IDs by default (#1635) 2021-06-27 22:57:20 +02:00
Mike Fährmann
e4db1bad14 [seisoparty] also extract files hosted on 'cdn-2' servers (#1635) 2021-06-27 22:55:09 +02:00
Mike Fährmann
267bbf5996 [mangasee] add 'chapter' and 'manga' extractors 2021-06-27 02:03:03 +02:00
Mike Fährmann
fad4918208 [deviantart] use UUIDs in internal folder/collection URLs 2021-06-26 00:56:57 +02:00
Mike Fährmann
64986f9435 fix depth counter in UrlJob
regression from adf4d661

It would either stop at the first level (-g) or go infinitely deep (-G)
Going down to for example level 3 with -ggg didn't work.
2021-06-26 00:30:03 +02:00
Mike Fährmann
0179581340 add 'T' format string conversion (#1646)
to convert 'date'/datetime to timestamp
2021-06-25 22:35:45 +02:00
Mike Fährmann
f74cf52e2b [seisoparty] add 'user' and 'post' extractors (#1635) 2021-06-25 18:40:11 +02:00
Mike Fährmann
759735fb02 [kemonoparty] fix 'username' extraction (fixes #1652)
The site's <title> content changed from

<title>NAME | Kemono</title>

to

<title>
    NAME | Kemono
</title>
2021-06-25 15:35:20 +02:00
Mike Fährmann
befe635022 cache parsed Formatter functions 2021-06-22 19:46:04 +02:00
Mike Fährmann
a416e54765 [directlink] manually encode Referer URLs (fixes #1647)
Trying to send a non-latin-1-encodable header raises an exception,
so we encode the Referer value ourselves with 'errors=ignore'.
2021-06-21 20:28:19 +02:00
Mike Fährmann
8bdeb2a6dd [webtoons] match arbitrary language codes (closes #1643) 2021-06-21 19:25:28 +02:00
Mike Fährmann
79b7ee2712 use 'functools.partial' in '_build_cleanfunc' when possible
makes calls to the returned function a slight bit faster (~10%)
2021-06-20 23:34:41 +02:00
Mike Fährmann
e661607e8b [mangadex] document 'metadata' and 'lang' options (#1535) 2021-06-20 22:44:02 +02:00
Mike Fährmann
ceaf7fd989 optimize 'base-directory' initialization and usage
apply 'clean_path()' only once
2021-06-20 21:35:43 +02:00
Mike Fährmann
2ca011dfa8 add 'kwdict' argument to PathFormat.build_filename() 2021-06-20 20:26:38 +02:00
Mike Fährmann
fd00d47116 implement conditional directories (#1394)
They work the same way as conditional filenames (84d2e640), e.g.

"directory": {
    "score >= 20": ["high score"],
    "score >= 5" : ["mid score"],
    ""           : ["{category}", "default"]
}
2021-06-20 20:09:35 +02:00
Mike Fährmann
ee1064a2b2 release version 1.18.0 2021-06-19 21:26:42 +02:00
Mike Fährmann
4adc44df69 [furaffinity] improve metadata extraction (fixes #1630)
Fetch 'title' and 'artist' metadata from a different location,
since for posts with an empty title the <title> element is
completely empty and does not contain the artist's name.
2021-06-19 03:29:00 +02:00
Mike Fährmann
e98fa01c44 [hitomi] update image URL code (fixes #1637) 2021-06-18 16:44:22 +02:00