Commit Graph

33 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
a08fdfac6e [foolfuuka] add 'archive.palanq.win' 2023-05-02 19:58:55 +02:00
Mike Fährmann
1870df8b23 [foolfuuka] remove 'tokyochronos.net' 2023-05-02 19:25:50 +02:00
Mike Fährmann
ef4e2d8178 [foolfuuka] remove 'archive.alice.al' 2023-05-02 19:23:26 +02:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
7e385ed63e [foolfuuka] update domains
- remove nyafuu
- add rozenarcana (https://archive.alice.al/)
- add tokyochronos (https://www.tokyochronos.net)
2022-08-26 17:57:17 +02:00
Mike Fährmann
2dc57637cf [foolfuuka] remove archive.wakarimasen.moe 2022-07-10 23:13:49 +02:00
Mike Fährmann
bd6ec5c352 [foolfuuka] match 4chan filenames (#2577)
introduce two new metadata fields:
- filename_media: original filename of file uploaded to 4chan
- timestamp_ms  : timestamp with millisecond precision (tim)
2022-05-15 14:39:54 +02:00
Mike Fährmann
d26da3b9e5 add pre-generated 'pattern' for supported BaseExtractor sites 2022-05-09 22:20:09 +02:00
Mike Fährmann
dee0d22561 update extractor test results 2022-02-06 21:39:24 +01:00
Mike Fährmann
275543b2d2 update extractor test results 2021-11-27 19:26:44 +01:00
Mike Fährmann
211de95dd0 update extractor test results 2021-11-01 02:58:53 +01:00
Mike Fährmann
c04f7ab139 [foolfuuka] add 'gallery' extractor (#1785) 2021-08-21 22:46:23 +02:00
Mike Fährmann
21c2da454f update extractor test results 2021-07-04 22:00:32 +02:00
Mike Fährmann
407627ec86 [foolfuuka] support 'archive.wakarimasen.moe' (closes #1595) 2021-06-02 15:45:43 +02:00
Mike Fährmann
532ac79fb0 update extractor test results 2021-05-21 02:28:53 +02:00
Mike Fährmann
671a95cae5 [foolfuuka] use BaseExtractor 2021-01-26 18:48:37 +01:00
Mike Fährmann
e9a75e27d9 [foolfuuka] stop search when results are exhausted (#1174) 2021-01-17 22:48:21 +01:00
Mike Fährmann
56b460dcea [foolfuuka] add 'search' extractors (#1174) 2021-01-02 02:34:06 +01:00
Mike Fährmann
fb64183d53 [foolfuuka] add 'board' extractors (closes #1044) 2021-01-01 19:33:35 +01:00
Mike Fährmann
1e3dd7330e merge SharedConfigMixin functionality into Extractor 2020-11-17 00:34:07 +01:00
Mike Fährmann
f5b7ae01c1 update extractor test results 2020-09-15 18:07:08 +02:00
Mike Fährmann
82f7f4172a update test results 2020-01-01 16:05:38 +01:00
Mike Fährmann
41a3169c67 [foolfuuka] use '{extension}' in default filename format 2019-11-28 23:12:48 +01:00
Mike Fährmann
2a3bd4e3c7 rename extractor classes starting with a digit 2019-11-02 20:42:09 +01:00
Mike Fährmann
8de5866fd2 [twitter] replace unit test URLs
https://twitter.com/PicturesEarth was deleted
2019-05-09 10:17:55 +02:00
Mike Fährmann
591a07f20c small code changes and cleanups 2019-03-13 22:03:02 +01:00
Mike Fährmann
09d872a2b1 generalize extractor creation code 2019-03-07 22:55:26 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
4d656a81ca replace SharedConfigExtractor class with a Mixin 2019-02-04 13:46:02 +01:00
Mike Fährmann
12ff750111 [foolfuuka] smaller code changes and updates 2019-02-04 12:55:33 +01:00
Mike Fährmann
58a9eede38 [foolfuuka] dynamically generate extractor classes 2019-02-03 17:09:45 +01:00