Mike Fährmann
|
3965f5d6a5
|
[zerochan] expect 500 status codes during login (#8097 #8114)
continuation of 4303b3ba9d
|
2025-08-24 18:16:51 +02:00 |
|
Mike Fährmann
|
4303b3ba9d
|
[zerochan] expect 500 status code for HTML requests (#8097)
|
2025-08-22 16:10:43 +02:00 |
|
Mike Fährmann
|
a097a373a9
|
simplify if statements by using walrus operators (#7671)
|
2025-07-22 20:57:54 +02:00 |
|
Mike Fährmann
|
d8ef1d693f
|
rename 'StopExtraction' to 'AbortExtraction'
for cases where StopExtraction was used to report errors
|
2025-07-09 21:07:28 +02:00 |
|
Mike Fährmann
|
9dbe33b6de
|
replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
|
2025-06-29 17:50:19 +02:00 |
|
Mike Fährmann
|
41191bb60a
|
'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
|
2025-06-18 13:05:58 +02:00 |
|
Mike Fährmann
|
e08ec7e083
|
update copyright notices
|
2025-06-13 00:03:41 +02:00 |
|
Mike Fährmann
|
b5c88b3d3e
|
replace standard library 're' uses with 'util.re()'
|
2025-06-06 13:24:52 +02:00 |
|
Mike Fährmann
|
492ea46c25
|
[zerochan] fix "KeyError: 'author'" (#7282)
fixes regression introduced in d746e025a0
|
2025-04-01 10:02:55 +02:00 |
|
Mike Fährmann
|
d746e025a0
|
[zerochan] parse JSON-LD data (#7178)
|
2025-03-17 19:59:44 +01:00 |
|
Mike Fährmann
|
5e13235aca
|
[zerochan] fix parsing regular JSON
i.e. remove debug remains ...
|
2024-12-14 20:41:08 +01:00 |
|
Mike Fährmann
|
a33065be86
|
[zerochan] parse API response manually when json.loads() fails (#6632)
|
2024-12-12 19:57:37 +01:00 |
|
Mike Fährmann
|
d2c66ac34d
|
[zerochan] fix 'source' extraction when not logged in
|
2024-12-12 18:16:11 +01:00 |
|
Mike Fährmann
|
34e157e166
|
[zerochan] download webp and gif files, add 'extensions' option (#6576)
|
2024-12-05 21:25:44 +01:00 |
|
Mike Fährmann
|
87a14a50e7
|
[zerochan] improve redirect handling, add 'redirects' option (#5891)
|
2024-08-10 11:32:30 +02:00 |
|
Mike Fährmann
|
8a6e208605
|
[zerochan] fix 'Invalid control character' errors (#5892)
|
2024-07-29 11:24:17 +02:00 |
|
Mike Fährmann
|
70f18b7a78
|
[zerochan] fix tag redirections (#5891)
|
2024-07-26 20:41:34 +02:00 |
|
Mike Fährmann
|
5207a0c2e0
|
[zerochan] implement 'tags' option (#5874)
allow splitting tags into separate lists by category
|
2024-07-23 10:21:33 +02:00 |
|
Mike Fährmann
|
1aadc29c5b
|
[zerochan] fix 'source' extraction
|
2024-07-23 09:34:44 +02:00 |
|
Mike Fährmann
|
ae40c61c21
|
[zerochan] fix tag category extraction (#5874)
|
2024-07-23 09:16:32 +02:00 |
|
Mike Fährmann
|
fef80a2f55
|
[zerochan] fetch metadata for each post separately (#5869)
instead of processing all posts at once before returning any of them
|
2024-07-20 02:11:27 +02:00 |
|
Mike Fährmann
|
b376fa814e
|
[zerochan] handle "KeyError - 'items'" (#5826)
Zerochan sometimes sends an empty response when there are no more
accessible posts to be had.
|
2024-07-05 21:34:33 +02:00 |
|
Mike Fährmann
|
cc6b9e4c18
|
[zerochan] use API by default (#3669)
add 'pagination' option
|
2024-02-25 00:36:14 +01:00 |
|
Mike Fährmann
|
42335ea880
|
[zerochan] fix skipping every other post
|
2024-02-15 02:51:01 +01:00 |
|
Mike Fährmann
|
adc3aa0b77
|
[zerochan] fix metadata extraction
author, path, tags
|
2023-11-24 21:21:14 +01:00 |
|
Mike Fährmann
|
a453335a9f
|
remove test results in extractor modules
and add generic example URLs
|
2023-09-11 16:30:55 +02:00 |
|
Mike Fährmann
|
d97b8c2fba
|
consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
|
2023-07-22 01:20:50 +02:00 |
|
enduser420
|
d52ed2bc5a
|
[zerochan] fix 'tags' extraction
|
2023-07-18 16:38:04 +05:30 |
|
Mike Fährmann
|
ed2d715019
|
fix 'keywords' in extractor tests (#3491)
|
2023-01-03 15:14:23 +01:00 |
|
Mike Fährmann
|
4063563cd7
|
[zerochan] update for layout v3
- remove cookie disabling v3
- fix and improve metadata extraction
|
2022-12-17 12:51:51 +01:00 |
|
Mike Fährmann
|
b0cb4a1b9c
|
replace 'text.extract()' with 'text.extr()' where possible
|
2022-11-05 01:14:09 +01:00 |
|
Mike Fährmann
|
3cb8327c60
|
[zerochan] add 'metadata' option (#2861)
|
2022-09-02 23:25:19 +02:00 |
|
Mike Fährmann
|
21ff77fea0
|
[zerochan] extract more metadata for single posts
Neither HTML pages nor RSS feed entries have *all* metadata.
It might be necessary to do 1-2 extra HTTP requests to grab everything.
|
2022-08-14 17:26:29 +02:00 |
|
Mike Fährmann
|
98af5a0409
|
[zerochan] implement login with username & password (#1434)
|
2022-07-29 12:56:20 +02:00 |
|
Mike Fährmann
|
3a8addfe45
|
[zerochan] add 'tag' and 'image' extractors (#1434)
|
2022-07-27 22:58:23 +02:00 |
|