Commit Graph

35 Commits

Author SHA1 Message Date
Mike Fährmann
e006d26c8e Revert "use f-strings when building 'pattern'"
revert d7c97d5a97.
2025-12-20 22:07:37 +01:00
Mike Fährmann
d7c97d5a97 use f-strings when building 'pattern' 2025-10-20 21:23:11 +02:00
Mike Fährmann
c38856bd3f [dt] use 'parse_datetime_iso()' for ISO formats 2025-10-19 21:52:05 +02:00
Mike Fährmann
085616e0a8 [dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()' 2025-10-17 17:43:06 +02:00
Mike Fährmann
a097a373a9 simplify if statements by using walrus operators (#7671) 2025-07-22 20:57:54 +02:00
Mike Fährmann
d8ef1d693f rename 'StopExtraction' to 'AbortExtraction'
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
Mike Fährmann
b0580aba86 update 'match.lastindex' usage 2025-06-18 20:24:13 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
811b665e33 remove @staticmethod decorators
There might have been a time when calling a static method was faster
than a regular method, but that is no longer the case. According to
micro-benchmarks, it is 70% slower in CPython 3.13 and it also makes
executing the code of a class definition slower.
2025-06-12 22:50:52 +02:00
Mike Fährmann
6c9b20fe45 [philomena] download 'full' URLs (#6922)
'view_url' URLs sometimes result in 404 errors
2025-02-02 18:23:46 +01:00
Mike Fährmann
4ab9237f1d [philomena] fix 'date' values without UTC offset (#6921)
Some instances do not include a UTC offset or 'Z' in their datetime
values, e.g. 2024-03-14T13:46:46 compared to 2024-03-14T13:46:46Z
2025-02-02 16:32:28 +01:00
Shelvacu
f8e707b92c [philomena] switch default ponybooru filter to get everything by default
The system filter mislabeled "Everything" hides 4 tags https://ponybooru.org/filters/2

There are [many public filters that don't hide anything](https://ponybooru.org/filters?fq=spoilered_count%3A0%2C+hidden_count%3A0), I just picked [the oldest one](https://ponybooru.org/filters/3).
2024-11-07 20:08:42 -08:00
Mike Fährmann
9b99d2c886 [philomena] support downloading SVG files (#5643) 2024-06-05 16:48:51 +02:00
Mike Fährmann
89066844f4 add 'config_instance' method
to allow for a more streamlined access to BaseExtractor instance options
2024-01-18 03:20:36 +01:00
Mike Fährmann
1f9b16a70b replace static 'sleep-request' defaults with dynamic ones 2023-12-18 22:06:26 +01:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
fceabee433 [philomena] use API interface class
handle 429 errors and retry after 10min (#4288)
2023-07-13 20:46:04 +02:00
Mike Fährmann
a1ffa1ff09 [philomena] fix '--range' (#4288) 2023-07-08 23:17:27 +02:00
Mike Fährmann
09fb212414 [philomena] match URLs with www subdomain 2023-01-24 22:43:24 +01:00
Mike Fährmann
775895f44b [booru] refactor 'tags' and 'notes' extraction
- move HTML request for post pages into its own function
- move gelbooru_v02.py notes extraction to gelbooru.py
  since it only works there
- clean up some code
2022-10-31 12:01:19 +01:00
Mike Fährmann
da11fb32d0 update extractor test results 2022-08-28 00:16:12 +02:00
Mike Fährmann
c6a9bab019 update extractor test results 2022-07-12 15:49:22 +02:00
Mike Fährmann
d26da3b9e5 add pre-generated 'pattern' for supported BaseExtractor sites 2022-05-09 22:20:09 +02:00
Mike Fährmann
7cb29224f0 [philomena] fix search parameter escaping (#2215)
The pluses from search terms in /tags/ URLs need to be
replaced with spaces to get accepted by Philomena.
2022-01-23 01:03:37 +01:00
Mike Fährmann
159631c808 [philomena] use a default 'filter_id' if non is given 2021-12-15 16:20:53 +01:00
Mike Fährmann
cfa4876848 [philomena] support furbooru.org (closes #1995) 2021-11-15 20:57:51 +01:00
Mike Fährmann
211de95dd0 update extractor test results 2021-11-01 02:58:53 +01:00
Mike Fährmann
c3b5c88b04 update extractor test results 2021-07-20 20:21:33 +02:00
Mike Fährmann
e60962f7e5 [philomena] improve tag escapes handling (fixes #1629) 2021-06-16 18:47:08 +02:00
Mike Fährmann
bdfcc9c4b1 update extractor test results 2021-04-18 20:28:15 +02:00
Mike Fährmann
ddd48ceee5 update extractor test results 2021-03-28 23:06:44 +02:00
Mike Fährmann
847e9b0ed7 [philomena] support post URLs without '/images/'
e.g. 'derpibooru.org/1'
2021-03-14 18:26:39 +01:00
Mike Fährmann
c485d0a956 [philomena] add generalized extractors for philomena sites
(closes #1379)
2021-03-14 17:19:57 +01:00