Commit Graph

24 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
ac97aca99c [realbooru] fix extraction
get file URLs from HTML pages
2023-04-02 20:45:16 +02:00
Mike Fährmann
cd931e1139 update extractor test results 2022-12-08 18:58:29 +01:00
Mike Fährmann
6423f990de [realbooru] fix 'tags' extraction (#2530) 2022-11-10 17:04:02 +01:00
Mike Fährmann
ecad02cf3f [realbooru] fix download URLs (#2530) 2022-11-10 13:29:35 +01:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
4fd3c893fa [booru] adjust/match '_tags' and '_notes' code 2022-11-04 19:49:39 +01:00
Mike Fährmann
88954aa2e4 [gelbooru_v02] implement 'notes' extraction
same code as for 'moebooru' works here as well
2022-11-04 19:49:39 +01:00
Mike Fährmann
775895f44b [booru] refactor 'tags' and 'notes' extraction
- move HTML request for post pages into its own function
- move gelbooru_v02.py notes extraction to gelbooru.py
  since it only works there
- clean up some code
2022-10-31 12:01:19 +01:00
Mike Fährmann
67a2efb885 [rule34] implement 'pool' pagination (#2853) 2022-08-26 17:57:17 +02:00
Mike Fährmann
f225247670 [gelbooru] add support for api_key and user_id (#2767) 2022-07-18 18:46:31 +02:00
Mike Fährmann
c6a9bab019 update extractor test results 2022-07-12 15:49:22 +02:00
Mike Fährmann
ff5e10a86d [hypnohub] move to gelbooru_v02 instances (#2631) 2022-05-28 21:10:05 +02:00
Mike Fährmann
d26da3b9e5 add pre-generated 'pattern' for supported BaseExtractor sites 2022-05-09 22:20:09 +02:00
Mike Fährmann
3e926bd465 [realbooru] fix extraction (fixes #2530) 2022-05-02 09:03:34 +02:00
Mike Fährmann
dee0d22561 update extractor test results 2022-02-06 21:39:24 +01:00
Mike Fährmann
199e7616a7 [rule34] use https://api.rule34.xxx for API requests 2022-01-08 17:14:50 +01:00
Mike Fährmann
93cef78450 [gelbooru] workaround pagination limits
Gelbooru only allows to retrieve the latest 20k posts for a tag search.
Add 'id:<N' to the search tags to work around that limitation, where N
is the ID of the last retrieved post.

http://gelbooru.me/index.php?page=forum&s=view&id=1467
2021-11-26 18:56:31 +01:00
Mike Fährmann
7bbb1f92d7 [gelbooru_v02] add 'favorite' extractor (closes #1834) 2021-09-10 20:43:59 +02:00
thatfuckingbird
dff03a6605 [booru] add an option to extract notes (only gelbooru for now) (#1457)
* [booru] add an option to extract notes (currently implemented only for gelbooru)

* appease linter

* [gelbooru] rename "text" to "body" in note extraction

* add a code comment about reusing return value of _extended_tags
2021-04-13 23:40:24 +02:00
thatfuckingbird
918b0441fb [gelbooru] fix tag category extraction (#1455) 2021-04-10 19:05:00 +02:00
Mike Fährmann
3df527ee2c update extractor test results 2021-02-27 21:01:29 +01:00
Mike Fährmann
59fd740b47 [tbib] add support for https://tbib.org/ (#473, closes #1082) 2021-02-17 00:28:25 +01:00
Mike Fährmann
08d7934c6e move extractors from booru.py into their own gelbooru_v02 module 2021-02-17 00:26:24 +01:00