26 Commits

Author SHA1 Message Date
Mike Fährmann
968597a302 yield 3-tuples for Message.Directory
adapt tuples to the same length and semantics as other messages
2025-12-05 21:39:52 +01:00
Mike Fährmann
085616e0a8 [dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()' 2025-10-17 17:43:06 +02:00
Mike Fährmann
a097a373a9 simplify if statements by using walrus operators (#7671) 2025-07-22 20:57:54 +02:00
NecRaul
9dde853fc5 [warosu] HTML attribute fix 2025-07-04 03:17:20 +04:00
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
NecRaul
5ba7c98bc2 [warosu] Simpler/less costly hostname check 2025-06-20 14:27:21 +04:00
NecRaul
3c85032b9b [warosu] Handle missing images in the extractor by checking hostname 2025-06-20 09:34:49 +04:00
Mike Fährmann
d9432ee297 [warosu] restore correct 'now' values 2025-06-16 12:00:59 +02:00
NecRaul
3bc6bc7c77 [warosu] Single quotes when string has a quotation mark 2025-06-16 12:50:20 +04:00
NecRaul
f56e810f42 [warosu] Attribute fix 2025-06-16 12:33:16 +04:00
Mike Fährmann
0fcd603498 [warosu] fix extraction 2024-07-26 21:09:07 +02:00
Mike Fährmann
296f20e630 [warosu] fix 'board_name' metadata 2024-03-06 01:28:47 +01:00
Mike Fährmann
24873c2724 [warosu] fix crash for threads with deleted posts (#5289) 2024-03-06 01:27:45 +01:00
Mike Fährmann
f9dac43be9 [warosu] fix file URLs 2023-11-24 02:44:55 +01:00
Mike Fährmann
13ce3a9acb [warosu] fix extraction (#4634) 2023-10-13 23:03:39 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
1c25cc7a3e [warosu] fix and update 2022-12-07 21:23:45 +01:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
1acaed73e0 [warosu] improve extraction and metadata
- convert values to int
- unquote original filenames
- don't parse posts twice
2018-09-28 13:03:12 +02:00
Mike Fährmann
34873dbd90 set 'archive_fmt' values
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
4ad903b797 [warosu] fix extraction 2017-09-14 14:57:40 +02:00
Mike Fährmann
6f30cf4c64 change keyword names to valid Python identifiers
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.

(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
2017-09-10 22:20:47 +02:00
Mike Fährmann
4ea82ea556 [warosu] add thread extractor 2017-08-18 19:54:07 +02:00