Commit Graph

45 Commits

Author SHA1 Message Date
Mike Fährmann
78f78fe64b [gelbooru_v02] support using 'api-key' & 'user-id' (#8077)
- detect API error messages
- add default 1.0s request delay to 'rule34'
2025-08-20 22:48:20 +02:00
Mike Fährmann
74c9356442 [rule34] fix file downloads (#7697)
replace 'api-cdn' subdomain of image files with 'wimg'
2025-06-20 15:07:10 +02:00
Mike Fährmann
4279928d0b [gelbooru_v02] extract 'total' / 'search_count' metadata (#7689) 2025-06-19 19:15:27 +02:00
Mike Fährmann
b0580aba86 update 'match.lastindex' usage 2025-06-18 20:24:13 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
811b665e33 remove @staticmethod decorators
There might have been a time when calling a static method was faster
than a regular method, but that is no longer the case. According to
micro-benchmarks, it is 70% slower in CPython 3.13 and it also makes
executing the code of a class definition slower.
2025-06-12 22:50:52 +02:00
Mike Fährmann
b5c88b3d3e replace standard library 're' uses with 'util.re()' 2025-06-06 13:24:52 +02:00
Mike Fährmann
a7bbccbd7b [common] add 'request_xml()' convenience function 2025-06-04 23:10:16 +02:00
Mike Fährmann
ef7ff31117 [realbooru] fix extraction (#6543)
- extract data from HTML pages since API is no longer usable
- move code into its own separate 'realbooru' module
2024-12-07 17:39:25 +01:00
Mike Fährmann
4a5dfc7d76 [rule34] fix 'favorite' extraction (#6573) 2024-12-01 18:17:25 +01:00
Mike Fährmann
9821503226 [misc] 'api_root' -> 'root_api' 2024-11-14 23:44:15 +01:00
Mike Fährmann
2818973981 [gelbooru_v02] unescape categorized tags 2024-10-10 17:30:55 +02:00
Mike Fährmann
51fd14f87d [gelbooru_v02] use total number of posts as end marker (#5830)
… and potentially retry on empty responses
2024-07-12 22:51:46 +02:00
Mike Fährmann
807e2f7094 [realbooru] fix videos and provide fallback URLs (#2530)
revert acc94ac187.
2024-05-31 23:58:40 +02:00
Mike Fährmann
acc94ac187 [realbooru] fix extraction
revert ac97aca99c
2024-01-20 17:56:07 +01:00
Mike Fährmann
93b4120e77 [gelbooru] support 'all' and empty tag (#5076) 2024-01-18 21:49:33 +01:00
Mike Fährmann
89066844f4 add 'config_instance' method
to allow for a more streamlined access to BaseExtractor instance options
2024-01-18 03:20:36 +01:00
Mike Fährmann
085411f3f1 [rule34] recognize URLs with 'www' subdomain (#4984) 2023-12-30 16:07:56 +01:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
6eca1fab9b [gelbooru_v02] support 'xbooru.com' (#4493) 2023-09-03 15:39:02 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
ac97aca99c [realbooru] fix extraction
get file URLs from HTML pages
2023-04-02 20:45:16 +02:00
Mike Fährmann
cd931e1139 update extractor test results 2022-12-08 18:58:29 +01:00
Mike Fährmann
6423f990de [realbooru] fix 'tags' extraction (#2530) 2022-11-10 17:04:02 +01:00
Mike Fährmann
ecad02cf3f [realbooru] fix download URLs (#2530) 2022-11-10 13:29:35 +01:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
4fd3c893fa [booru] adjust/match '_tags' and '_notes' code 2022-11-04 19:49:39 +01:00
Mike Fährmann
88954aa2e4 [gelbooru_v02] implement 'notes' extraction
same code as for 'moebooru' works here as well
2022-11-04 19:49:39 +01:00
Mike Fährmann
775895f44b [booru] refactor 'tags' and 'notes' extraction
- move HTML request for post pages into its own function
- move gelbooru_v02.py notes extraction to gelbooru.py
  since it only works there
- clean up some code
2022-10-31 12:01:19 +01:00
Mike Fährmann
67a2efb885 [rule34] implement 'pool' pagination (#2853) 2022-08-26 17:57:17 +02:00
Mike Fährmann
f225247670 [gelbooru] add support for api_key and user_id (#2767) 2022-07-18 18:46:31 +02:00
Mike Fährmann
c6a9bab019 update extractor test results 2022-07-12 15:49:22 +02:00
Mike Fährmann
ff5e10a86d [hypnohub] move to gelbooru_v02 instances (#2631) 2022-05-28 21:10:05 +02:00
Mike Fährmann
d26da3b9e5 add pre-generated 'pattern' for supported BaseExtractor sites 2022-05-09 22:20:09 +02:00
Mike Fährmann
3e926bd465 [realbooru] fix extraction (fixes #2530) 2022-05-02 09:03:34 +02:00
Mike Fährmann
dee0d22561 update extractor test results 2022-02-06 21:39:24 +01:00
Mike Fährmann
199e7616a7 [rule34] use https://api.rule34.xxx for API requests 2022-01-08 17:14:50 +01:00
Mike Fährmann
93cef78450 [gelbooru] workaround pagination limits
Gelbooru only allows to retrieve the latest 20k posts for a tag search.
Add 'id:<N' to the search tags to work around that limitation, where N
is the ID of the last retrieved post.

http://gelbooru.me/index.php?page=forum&s=view&id=1467
2021-11-26 18:56:31 +01:00
Mike Fährmann
7bbb1f92d7 [gelbooru_v02] add 'favorite' extractor (closes #1834) 2021-09-10 20:43:59 +02:00
thatfuckingbird
dff03a6605 [booru] add an option to extract notes (only gelbooru for now) (#1457)
* [booru] add an option to extract notes (currently implemented only for gelbooru)

* appease linter

* [gelbooru] rename "text" to "body" in note extraction

* add a code comment about reusing return value of _extended_tags
2021-04-13 23:40:24 +02:00
thatfuckingbird
918b0441fb [gelbooru] fix tag category extraction (#1455) 2021-04-10 19:05:00 +02:00
Mike Fährmann
3df527ee2c update extractor test results 2021-02-27 21:01:29 +01:00
Mike Fährmann
59fd740b47 [tbib] add support for https://tbib.org/ (#473, closes #1082) 2021-02-17 00:28:25 +01:00
Mike Fährmann
08d7934c6e move extractors from booru.py into their own gelbooru_v02 module 2021-02-17 00:26:24 +01:00