Commit Graph

43 Commits

Author SHA1 Message Date
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
811b665e33 remove @staticmethod decorators
There might have been a time when calling a static method was faster
than a regular method, but that is no longer the case. According to
micro-benchmarks, it is 70% slower in CPython 3.13 and it also makes
executing the code of a class definition slower.
2025-06-12 22:50:52 +02:00
Mike Fährmann
827eeca0bc [paheal] fix '404 Not Found' for tags with URL encoded characters (#7642) 2025-06-08 16:23:11 +02:00
Mike Fährmann
d2dda2bc00 [paheal] implement fast '--range' support (#5905) 2024-07-30 12:55:57 +02:00
Mike Fährmann
c8b591303f [paheal] cleanup 2024-02-27 02:27:20 +01:00
Mike Fährmann
b41d9bf616 [paheal] fix 'source' metadata 2024-01-19 22:24:39 +01:00
Mike Fährmann
f9544194c0 [paheal] restore 'extension' metadata (#4976) 2023-12-26 16:09:26 +01:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
f0cb951566 [paheal] unescape 'source' 2023-07-07 20:03:00 +02:00
Mike Fährmann
b480b7076a [paheal] fix a78f8ce5 for enabled 'metadata' (#4262) 2023-07-07 20:00:49 +02:00
Mike Fährmann
a78f8ce5b0 [paheal] fix extraction (#4262)
swap ' and "
2023-07-04 17:36:41 +02:00
Mike Fährmann
7865067d19 [shimmie2] add generic extractors for Shimmie2 sites (#3734)
add support for
- loudbooru.com       (#3734)
- booru.cavemanon.xyz (#3734)
- giantessbooru.com   (#943)
- tentaclerape.net
2023-04-26 19:20:44 +02:00
Mike Fährmann
2ed58029f9 {paheal[ add proper support for videos (#2892) 2022-09-04 13:30:48 +02:00
Mike Fährmann
4b78bd423f [paheal] add 'metadata' option (#2641) 2022-06-04 16:05:49 +02:00
Mike Fährmann
61fa9b535a [paheal] improve metadata extraction (#2641)
- unescape 'tags'
- add 'date', 'source', and 'uploader' for single posts
2022-05-30 17:23:08 +02:00
Mike Fährmann
211de95dd0 update extractor test results 2021-11-01 02:58:53 +01:00
Mike Fährmann
4b1cda4cf7 [paheal] fix metadata extraction 2021-02-14 15:43:39 +01:00
Mike Fährmann
43120407cc [paheal] create directory for each post (closes #1147) 2020-12-01 12:14:55 +01:00
Mike Fährmann
1e3dd7330e merge SharedConfigMixin functionality into Extractor 2020-11-17 00:34:07 +01:00
Mike Fährmann
558cde139c [paheal] fix extraction (fixes #1088) 2020-10-28 21:51:31 +01:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
844793847c update extractor test results 2020-10-11 18:15:41 +02:00
Mike Fährmann
19bf76bcf8 update extractor test results 2020-08-03 21:57:00 +02:00
Mike Fährmann
1d4a369ea2 update extractor test results 2020-02-27 22:15:40 +01:00
Mike Fährmann
e6cd49e78b update extractor test results 2020-02-16 21:48:46 +01:00
Mike Fährmann
2852691d78 [paheal] replace test URL
searching for 'k-on' doesn't yield any results anymore
2020-01-27 22:19:41 +01:00
Mike Fährmann
62335b9015 [paheal] adjust test results 2019-06-05 11:42:01 +02:00
Mike Fährmann
6a34f4b0c1 skip tests on read timeouts; print list of skipped tests 2019-06-01 20:47:31 +02:00
Mike Fährmann
d6ddb74cde update test results
- deviantart: 'index' is now an integer
- flickr: image file with lower quality
- paheal: image server name changed
- rule34: post got deleted
2019-04-12 09:59:48 +02:00
Mike Fährmann
f8782c05f2 [paheal] rename "tags" to "search_tags"
to better match field names of other booru extractors
2019-02-17 18:18:09 +01:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
4d656a81ca replace SharedConfigExtractor class with a Mixin 2019-02-04 13:46:02 +01:00
Mike Fährmann
4d73cc785d update test results 2018-12-14 16:07:32 +01:00
Mike Fährmann
c9f70e0a19 [paheal] use HTTPS 2018-07-17 21:25:03 +02:00
Mike Fährmann
7a58151566 fix util.parse_bytes invocations
(should be text.parse_bytes)
2018-05-10 22:07:55 +02:00
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
34873dbd90 set 'archive_fmt' values
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
40d35c87bc [paheal] add tag- and post-extractors (closes #69) 2018-01-15 16:39:05 +01:00