Mike Fährmann
41191bb60a
'match.group(N)' -> 'match[N]' ( #7671 )
...
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
0a3fac2dfe
merge #7664 : [archivedmoe] redirect URL fixes ( #7652 )
2025-06-15 10:03:34 +02:00
Mike Fährmann
b245218c1d
[archivedmoe] reword some comments and variable names
2025-06-15 10:00:45 +02:00
NecRaul
6668acf91e
[archivedmoe] Sort boards alphabetically
2025-06-13 19:29:47 +04:00
NecRaul
3a4e19d284
[archivedmoe] Simplify board extraction from url
2025-06-13 18:44:02 +04:00
NecRaul
a7aa18a8c1
[archivedmoe] remove unnecessary logging
2025-06-13 18:28:21 +04:00
NecRaul
8b2adeb41e
[archivedmoe] simplify board URL redirection logic
2025-06-13 18:26:39 +04:00
NecRaul
05081dea2e
Lint with flake8
2025-06-13 17:56:43 +04:00
NecRaul
223fe960a0
[archivedmoe] redirect URL changes (again)
...
Redirects to warosu.org instead of 4chan's cdn for certain boards
Redirects to archive.4plebs.org instead of 4chan's cdn for /tg/
Slices the filename only if it's redirecting to certain archives
2025-06-13 17:43:16 +04:00
Mike Fährmann
e08ec7e083
update copyright notices
2025-06-13 00:03:41 +02:00
Mike Fährmann
811b665e33
remove @staticmethod decorators
...
There might have been a time when calling a static method was faster
than a regular method, but that is no longer the case. According to
micro-benchmarks, it is 70% slower in CPython 3.13 and it also makes
executing the code of a class definition slower.
2025-06-12 22:50:52 +02:00
NecRaul
e3df99dbb9
Apply mikf's diff regarding Archived.moe
...
Moved (and refactored) code into remote()
Added a check for fixup_timestamp
2025-06-11 21:51:03 +04:00
NecRaul
4370654532
Simplify remote_media_link assignment
2025-06-11 04:49:21 +04:00
NecRaul
cb74d0f2f3
Lint with flake8
2025-06-11 04:46:18 +04:00
NecRaul
96bb2b1630
Fix Archived.moe redirection issue
...
Unless the board is /b/ (in which case redirection works fine),
remove the characters of the filename portion of the url until
filename portion of the url is 13 characters long (epoch millis).
2025-06-11 04:42:03 +04:00
Mike Fährmann
23c4bc8ac5
[b4k] keep support for previous 'arch.b4k.co' domain
2025-02-09 11:11:38 +01:00
NecRaul
dae82f1519
[b4k] update domain to arch.b4k.dev
2025-02-09 01:28:23 +04:00
Mike Fährmann
36883e458e
use 'v[0] == "c"' instead of 'v.startswith("c")'
2024-10-15 08:24:06 +02:00
Mike Fährmann
64948f2c09
[foolfuuka] improve 'board' pattern & support pages ( #5408 )
2024-04-01 22:31:25 +02:00
Mike Fährmann
1f7101d606
[archivedmoe] fix thebarchive webm URLs ( #5116 )
2024-01-27 00:24:41 +01:00
Mike Fährmann
1f9b16a70b
replace static 'sleep-request' defaults with dynamic ones
2023-12-18 22:06:26 +01:00
Mike Fährmann
3ecb512722
send Referer headers by default
2023-09-19 00:02:04 +02:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
a08fdfac6e
[foolfuuka] add 'archive.palanq.win'
2023-05-02 19:58:55 +02:00
Mike Fährmann
1870df8b23
[foolfuuka] remove 'tokyochronos.net'
2023-05-02 19:25:50 +02:00
Mike Fährmann
ef4e2d8178
[foolfuuka] remove 'archive.alice.al'
2023-05-02 19:23:26 +02:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
7e385ed63e
[foolfuuka] update domains
...
- remove nyafuu
- add rozenarcana (https://archive.alice.al/ )
- add tokyochronos (https://www.tokyochronos.net )
2022-08-26 17:57:17 +02:00
Mike Fährmann
2dc57637cf
[foolfuuka] remove archive.wakarimasen.moe
2022-07-10 23:13:49 +02:00
Mike Fährmann
bd6ec5c352
[foolfuuka] match 4chan filenames ( #2577 )
...
introduce two new metadata fields:
- filename_media: original filename of file uploaded to 4chan
- timestamp_ms : timestamp with millisecond precision (tim)
2022-05-15 14:39:54 +02:00
Mike Fährmann
d26da3b9e5
add pre-generated 'pattern' for supported BaseExtractor sites
2022-05-09 22:20:09 +02:00
Mike Fährmann
dee0d22561
update extractor test results
2022-02-06 21:39:24 +01:00
Mike Fährmann
275543b2d2
update extractor test results
2021-11-27 19:26:44 +01:00
Mike Fährmann
211de95dd0
update extractor test results
2021-11-01 02:58:53 +01:00
Mike Fährmann
c04f7ab139
[foolfuuka] add 'gallery' extractor ( #1785 )
2021-08-21 22:46:23 +02:00
Mike Fährmann
21c2da454f
update extractor test results
2021-07-04 22:00:32 +02:00
Mike Fährmann
407627ec86
[foolfuuka] support 'archive.wakarimasen.moe' ( closes #1595 )
2021-06-02 15:45:43 +02:00
Mike Fährmann
532ac79fb0
update extractor test results
2021-05-21 02:28:53 +02:00
Mike Fährmann
671a95cae5
[foolfuuka] use BaseExtractor
2021-01-26 18:48:37 +01:00
Mike Fährmann
e9a75e27d9
[foolfuuka] stop search when results are exhausted ( #1174 )
2021-01-17 22:48:21 +01:00
Mike Fährmann
56b460dcea
[foolfuuka] add 'search' extractors ( #1174 )
2021-01-02 02:34:06 +01:00
Mike Fährmann
fb64183d53
[foolfuuka] add 'board' extractors ( closes #1044 )
2021-01-01 19:33:35 +01:00
Mike Fährmann
1e3dd7330e
merge SharedConfigMixin functionality into Extractor
2020-11-17 00:34:07 +01:00
Mike Fährmann
f5b7ae01c1
update extractor test results
2020-09-15 18:07:08 +02:00
Mike Fährmann
82f7f4172a
update test results
2020-01-01 16:05:38 +01:00
Mike Fährmann
41a3169c67
[foolfuuka] use '{extension}' in default filename format
2019-11-28 23:12:48 +01:00
Mike Fährmann
2a3bd4e3c7
rename extractor classes starting with a digit
2019-11-02 20:42:09 +01:00
Mike Fährmann
8de5866fd2
[twitter] replace unit test URLs
...
https://twitter.com/PicturesEarth was deleted
2019-05-09 10:17:55 +02:00
Mike Fährmann
591a07f20c
small code changes and cleanups
2019-03-13 22:03:02 +01:00