Commit Graph

161 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
a45a17ddb7 [pixiv] ignore 'limit_sanity_level' images (#4328) 2023-07-22 14:57:38 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
0b34a444e0 [pixiv:novel] only detect Pixiv embeds (#4175) 2023-06-13 18:58:35 +02:00
Mike Fährmann
0cf7282fa0 [pixiv] add 'full-series' option for novels (#4111) 2023-06-01 13:07:20 +02:00
Mike Fährmann
ffed7efb6f [pixiv] use BASE_PATTERN 2023-05-28 18:06:47 +02:00
Mike Fährmann
b286efefcc [pixiv] add 'novel-bookmark' extractor (#4111) 2023-05-28 16:30:17 +02:00
Mike Fährmann
3fca455b82 [pixiv] add 'embeds' option (#1241) 2023-05-23 12:14:06 +02:00
Mike Fährmann
56b8b8cd36 [pixiv] support short novel URLs
https://www.pixiv.net/n/<ID>
2023-05-21 14:26:30 +02:00
Mike Fährmann
20dc13f832 [pixiv] initial 'novel' support (#1241, #4044)
supported URLs are
- https://www.pixiv.net/novel/show.php?id=<ID>
- https://www.pixiv.net/novel/series/<ID>
- https://www.pixiv.net/en/users/<ID>/novels
2023-05-12 16:34:08 +02:00
Mike Fährmann
b12dad8df5 [pixiv] fix 'pixivision' extraction 2023-04-30 15:35:32 +02:00
thatfuckingbird
9f76783ac0 [pixiv] allow sorting by popularity (requires pixiv premium) 2023-04-26 22:49:29 +02:00
Mike Fährmann
362cd6991b [pixiv] implement 'metadata-bookmark' option (#3417) 2023-01-07 23:19:43 +01:00
Mike Fährmann
a6d4733e11 [pixiv] extract 'date_url' metadata (#3405)
i.e. the datetime encoded in each file URL.

https://i.pximg.net/img-master/img/2022/12/01/13/44/55/12345678_p0.jpg
->
2022-12-01 13:44:55 +09:00
->
2022-12-01 04:44:55
2022-12-15 11:40:20 +01:00
Mike Fährmann
5a17e15b76 [pixiv] preserve 'tags' order (#3266)
for '"tags": "translated"'

As it turns out, set() does *not* preserve insertion order.
2022-11-22 19:11:37 +01:00
Mike Fährmann
eaae4d9b65 [pixiv] stop with error for invalid search/ranking parameters
instead of falling back to defaults
2022-11-15 12:17:53 +01:00
Mike Fährmann
368f156378 [pixiv] rankings: add support for the new daily AI and daily AI R18
(#3214, #3221)

In remembrance of @thatfuckingbird
2022-11-15 11:47:57 +01:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
769e6754dc [pixiv] use 'exact_match_for_tags' as default search mode (#3092) 2022-10-24 16:08:12 +02:00
Mike Fährmann
52d1eb928d [pixiv] extend 'metadata' option (#3057)
make it usable for all 'pixiv' extractors
2022-10-16 15:32:31 +02:00
Mike Fährmann
63e0924927 [pixiv] add 'series' extractor (#2964) 2022-09-27 23:24:03 +02:00
Mike Fährmann
d5ded11aa8 [pixiv] fix default filenames for backgrounds 2022-07-11 12:45:38 +02:00
Mike Fährmann
345199a3ec [pixiv] include '.gif' in background fallback URLs (#2495) 2022-06-03 17:25:23 +02:00
Mike Fährmann
4005171db3 [pixiv] provide more metadata fields when option enabled (#2594) 2022-05-15 14:47:14 +02:00
Mike Fährmann
6ae3a5cdb0 [pixiv] make retrieving ugoira metadata non-fatal (#2562) 2022-05-08 20:05:38 +02:00
Mike Fährmann
9adea93aef [pixiv] updates to avatar/background extractors (#2495)
- add 'date' metadata to avatar/background files when available
  and use that in default filenames / archive ids
- remove deprecation warnings as their option names clash with
  subcategory names
2022-05-04 17:30:54 +02:00
Mike Fährmann
84756982e9 [pixiv] implement 'include' option
- split 'user' extractor and its 'avatar' and 'background' options into
  separate extractors ('artworks', 'avatar', 'background')
- avatars can now be downloaded with
  https://www.pixiv.net/en/users/ID/avatar
  as URL and will use a proper archive key; similar for backgrounds
- options for the 'user' subcategory must be moved to 'artworks' to have
  the same effect as before
2022-05-02 09:03:35 +02:00
Mike Fährmann
82eee72b39 [pixiv] update API interface
- start all endpoints with '/'
- use extractor.wait() for rate limit
- retry with while loop instead of recursion
- in case of error, write entire response to debug log
2022-05-02 09:03:34 +02:00
Mike Fährmann
9e6ff42a9d [pixiv] implement 'background' option (#623, #1124, #2495) 2022-04-21 13:53:02 +02:00
Mike Fährmann
4bec34fc94 [pixiv] allow setting a date range for search results (#2133)
with the 'scd' and 'ecd' query parameters
2021-12-23 23:03:39 +01:00
Mike Fährmann
e33125ad39 [pixiv] add 'sketch' extractor (#1497) 2021-10-13 00:02:11 +02:00
Mike Fährmann
eed6ef3de0 [pixiv] fix pixivision title extraction 2021-09-02 22:34:59 +02:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
8ecca3af58 [pixiv] add extractor for 'pixivision' articles (#1672) 2021-07-07 16:27:16 +02:00
Mike Fährmann
7273cf8536 [pixiv] support fetching privately followed users (fixes #1628) 2021-06-16 19:56:09 +02:00
thatfuckingbird
e6811c7450 [pixiv] implement 'max-posts' option (#1558)
* implement max-rank for pixiv

* rename to max-posts and make more generic
2021-05-24 17:49:46 +02:00
Mike Fährmann
5eeaaee01d [pixiv] add 'metadata' option (#1551) 2021-05-14 20:30:28 +02:00
Mike Fährmann
36ed1efcfb [pixiv] rename "noop" value for 'tags' option to "original"
(#1507)
2021-05-07 20:41:54 +02:00
Mike Fährmann
fa519f9202 [pixiv] change 'translated-tags' option (#1507)
- rename to 'tags'
- use string-values: "japanese", "translated", "noop"
- remove duplicate entries for "translated" tags
2021-04-29 19:30:43 +02:00
Mike Fährmann
221015e586 [downloader:http] disable filename extension changes for ugoira
(#1507)
2021-04-27 01:29:09 +02:00
thatfuckingbird
141ca4ac0a [pixiv] also save untranslated tags when translated-tags is enabled (#1501) 2021-04-23 23:02:41 +02:00
beesdotjson
5ad615f0db fix PixivFavoriteExtractor regex (#1405)
* fix PixivFavoriteExtractor regex

* do not use lookbehind
2021-03-25 14:59:33 +01:00
Mike Fährmann
7440d1f112 [pixiv] add 'translated-tags' option (closes #1354)
(a lot more straight forward than I thought ...)
2021-03-05 17:18:51 +01:00
Mike Fährmann
8974f0361c [pixiv] update (#1304)
- remove login with username & password
- require a refresh token
- add 'oauth:pixiv' functionality

See also:
- https://github.com/upbit/pixivpy/issues/158
- https://gist.github.com/ZipFile/c9ebedb224406f4f11845ab700124362
2021-02-12 18:07:16 +01:00
Mike Fährmann
193dca2ce1 update extractor test results 2021-01-21 21:35:42 +01:00
Mike Fährmann
c008cb5100 [pixiv] add 'related' option (#1237) 2021-01-17 22:48:32 +01:00
Mike Fährmann
3bd08acc8f [pixiv] output debug message on failed login attempt
(#1192)
2020-12-22 14:59:31 +01:00
Mike Fährmann
b58e605dc7 raise error when required username or password are missing
do not try to login as 'None' (#1192)
2020-12-22 14:40:18 +01:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
844793847c update extractor test results 2020-10-11 18:15:41 +02:00