83 Commits

Author SHA1 Message Date
Mike Fährmann
53cdfaac37 [common] add reference to 'exception' module to Extractor class
- remove 'exception' imports
- replace with 'self.exc'
2026-02-15 10:57:22 +01:00
Mike Fährmann
0c24955507 [mangapark] export GraphQL queries 2026-02-01 19:18:10 +01:00
Mike Fährmann
8e855bd810 replace '// 1000' with '/ 1000' for timestamp conversions
regular division is slightly faster than floor division
and a float timestamp value is treated the same as an integer one
2026-01-04 16:51:31 +01:00
Mike Fährmann
e006d26c8e Revert "use f-strings when building 'pattern'"
revert d7c97d5a97.
2025-12-20 22:07:37 +01:00
Mike Fährmann
d7c97d5a97 use f-strings when building 'pattern' 2025-10-20 21:23:11 +02:00
Mike Fährmann
9bf76c1352 replace 'util.re()' with 'text.re()'
remove unnecessary 'util' imports
2025-10-20 17:44:58 +02:00
Mike Fährmann
085616e0a8 [dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()' 2025-10-17 17:43:06 +02:00
Mike Fährmann
a097a373a9 simplify if statements by using walrus operators (#7671) 2025-07-22 20:57:54 +02:00
Mike Fährmann
d8ef1d693f rename 'StopExtraction' to 'AbortExtraction'
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
Mike Fährmann
f2a72d8d1e replace 'request(…).json()' with 'request_json(…)' 2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
b5c88b3d3e replace standard library 're' uses with 'util.re()' 2025-06-06 13:24:52 +02:00
Mike Fährmann
32a06961ba [mangapark] support v3 URLs (#2072) 2025-03-25 20:01:45 +01:00
Mike Fährmann
ebf05e53fe [mangapark] support mirror domains 2025-03-25 19:37:26 +01:00
vonProteus
58e7808bbb [mangapark] utilizing more graphql and adjust functionality for new site (#4999)
- undo formatting changes
- simplify code
- update and fix tests
2025-03-24 20:34:23 +01:00
Mike Fährmann
1ae3ac5e39 [common] add '_extract_nextdata' method 2025-01-12 11:48:36 +01:00
Mike Fährmann
27ec653991 fix bug in test_init and update example URLs 2023-09-14 13:27:03 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
3d8de383bf [mangapark] extract 'source_id' for manga
forgot to add this to 6ae3101f
2023-07-02 15:17:10 +02:00
Mike Fährmann
6ae3101fd0 [mangapark] add 'source' option (#3969) 2023-07-02 15:07:22 +02:00
Mike Fährmann
3479646f65 [mangapark] update and fix 'manga' extractor (#3969)
TODO:
- non-English chapters
- 'source' option
2023-06-30 17:17:54 +02:00
Mike Fährmann
10786c657e [mangapark] update and fix 'chapter' extractor (#3969) 2023-06-29 23:44:44 +02:00
Mike Fährmann
dd884b02ee replace json.loads with direct calls to JSONDecoder.decode 2023-02-09 15:22:00 +01:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
c6a9bab019 update extractor test results 2022-07-12 15:49:22 +02:00
Mike Fährmann
211de95dd0 update extractor test results 2021-11-01 02:58:53 +01:00
Mike Fährmann
21c2da454f update extractor test results 2021-07-04 22:00:32 +02:00
thatfuckingbird
264beb8556 recognize v2.mangapark URLs (#1578)
* recognize v2.mangapark URLs

* update mangapark root url to use the v2 subdomain
2021-05-26 14:58:50 +02:00
Mike Fährmann
8b22d4e667 [mangapark] use '"browser": "firefox"' by default
to get rid of Cloudflare CAPTCHA resonses
2021-04-23 23:21:02 +02:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
69e4871005 update extractor test results
- sensescans: replace 404d chapters
- mangapark: replace 404d chapters
- subscribestar: update test for attached files
2020-08-28 22:32:32 +02:00
Mike Fährmann
d3b3b30107 update test results 2020-04-26 22:14:28 +02:00
Mike Fährmann
4203dc0bdc [mangapark] fix metadata extraction 2020-03-28 03:00:26 +01:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
32edf4fc7b add '_extractor' info to manga extractor results 2019-02-13 13:23:36 +01:00
Mike Fährmann
580baef72c change Chapter and MangaExtractor classes
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
0e46db6f45 rename some base classes
They shouldn't be called …Extractor if they don't have 'Extractor' as
their base class.
2019-02-08 11:43:40 +01:00
Mike Fährmann
8095f5f81a [mangapark] fix manga title extraction 2019-01-28 18:04:42 +01:00
Mike Fährmann
217a0687ef [behance] add 'collection' extractor (closes #157) 2019-01-19 18:11:20 +01:00
Mike Fährmann
66460337f1 [mangapark] fix extraction 2019-01-17 21:24:53 +01:00
Mike Fährmann
fa7fa2f8ff [deviantart1 update tests] 2019-01-01 15:39:34 +01:00
Mike Fährmann
b7b5456a32 [kissmanga] use HTTPS 2018-12-30 14:04:46 +01:00
Mike Fährmann
98314aa04c [mangapark] detect non-existent chapters 2018-12-27 21:41:50 +01:00
Mike Fährmann
f9ace0f4a3 [mangapark] fix manga extraction ... again 2018-12-26 18:56:57 +01:00
Mike Fährmann
0c9762f00e [mangapark] fix extraction 2018-12-22 13:52:48 +01:00
Mike Fährmann
4d73cc785d update test results 2018-12-14 16:07:32 +01:00
Mike Fährmann
1c6b9ba322 [readcomiconline] use HTTPS 2018-12-09 14:54:55 +01:00