Mike Fährmann
d7c97d5a97
use f-strings when building 'pattern'
2025-10-20 21:23:11 +02:00
Mike Fährmann
cefdde65ba
[readcomiconline] use 'text.re()'
2025-07-24 15:58:21 +02:00
Mike Fährmann
aae85fef61
[readcomiconline] force 'One page' Reading mode ( #7890 )
2025-07-24 15:53:39 +02:00
Mike Fährmann
d8ef1d693f
rename 'StopExtraction' to 'AbortExtraction'
...
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
enduser420
8c1628ea4e
[readcomiconline] fix extraction
2025-07-07 02:32:23 +05:30
Mike Fährmann
9dbe33b6de
replace old %-formatted and .format(…) strings with f-strings ( #7671 )
...
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
26e81e4162
[common] rename 'gallery_url'/'manga_url' to 'page_url
2025-06-26 22:06:57 +02:00
Mike Fährmann
41191bb60a
'match.group(N)' -> 'match[N]' ( #7671 )
...
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083
update copyright notices
2025-06-13 00:03:41 +02:00
Mike Fährmann
becdfbd806
[readcomiconline] fix 'issue' extractor ( #7269 )
...
- pth -> pht
- spaces -> tabs
2025-03-30 18:42:04 +02:00
Mike Fährmann
26163db69d
[readcomiconline] fix chapter extractor ( #6070 , #6335 )
2024-12-03 10:54:58 +01:00
Mike Fährmann
7b445ec255
[readcomiconline] update ( #5866 )
2024-07-23 18:56:49 +02:00
Mike Fährmann
b38a917355
[common] add Extractor.input() method
2024-04-16 00:02:48 +02:00
Mike Fährmann
1f9b16a70b
replace static 'sleep-request' defaults with dynamic ones
2023-12-18 22:06:26 +01:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
c6a9bab019
update extractor test results
2022-07-12 15:49:22 +02:00
Mike Fährmann
6c0fa2f258
[readcomiconline] update
2022-06-05 21:40:08 +02:00
Mike Fährmann
310fee99d5
[readcomiconline] remove automatic 'browser' setting ( #2625 )
2022-05-27 13:44:28 +02:00
Mike Fährmann
82c1cc130b
[readcomiconline] update deobfuscation code ( #2481 )
2022-05-17 10:52:45 +02:00
Mike Fährmann
12bd9ba33a
[readcomiconline] add 'quality' option ( #2467 )
2022-04-15 18:10:37 +02:00
Mike Fährmann
60ad46ddcc
[readcomiconline] unobfuscate image URLs ( #2481 )
2022-04-15 18:04:09 +02:00
Mike Fährmann
2133f1d77f
[readcomiconline] change domain to 'readcomiconline.li'
...
(closes #1517 )
2021-05-01 16:41:16 +02:00
Mike Fährmann
fc15930266
[readcomiconline] download high quality image versions
...
(fixes #1347 )
2021-02-28 01:11:32 +01:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
c874071f5a
[kissmanga] remove module
2020-10-04 22:46:41 +02:00
Mike Fährmann
4465a3ea68
[kissmanga][readcomiconline] add 'captcha' option ( #279 )
...
to configure how to handle CAPTCHA page redirects:
- either interactively wait for the user to solve the CAPTCHA
- or raise StopExtraction like before
2019-05-27 22:24:48 +02:00
Mike Fährmann
48233f00c0
[readcomiconline] detect 'AreYouHuman' redirects ( #279 )
2019-05-26 15:58:37 +02:00
Mike Fährmann
6dae6bee37
automatically detect and bypass cloudflare challenge pages
...
TODO: cache and re-apply cfclearance cookies
2019-03-10 15:31:33 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
32edf4fc7b
add '_extractor' info to manga extractor results
2019-02-13 13:23:36 +01:00
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
6126615698
update URLs for supportedsites.rst
2019-01-30 16:18:22 +01:00
Mike Fährmann
259123732f
[readcomiconline] improve comic-page parsing
2018-12-30 13:19:23 +01:00
Mike Fährmann
1c6b9ba322
[readcomiconline] use HTTPS
2018-12-09 14:54:55 +01:00
Mike Fährmann
1d43cbbf52
[gelbooru] tag-splitting for non-api mode
2018-07-06 15:24:19 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
2018-04-20 14:53:21 +02:00
Mike Fährmann
d11fcf4804
smaller changes and fixes
...
- fix the cloudflare challenge result if the last decimal places
are zero (JS`s toFixed() removes trailing zeroes)
- fix downloading of kissmanga chapter-pages hosted on blogspot
(accessing blogspot with "kissmanga.com" as referrer yields a 401)
- disable certificate validation for 'mangahere' tests
- update flickr test result
2018-04-06 15:30:09 +02:00
Mike Fährmann
179bcdd349
adjust archive-ids
2018-02-13 04:50:45 +01:00
Mike Fährmann
3cec533c28
Merge branch 'archive'
2018-02-12 18:07:58 +01:00
Mike Fährmann
5b3c34aa96
use generic chapter-extractor in more modules
2018-02-07 12:36:39 +01:00
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
e6814aebe2
add 'extractor.*.user-agent' config option
2017-11-15 14:01:33 +01:00
Mike Fährmann
68a0a7579c
fix/improve some regular expressions
2017-10-09 22:37:50 +02:00
Mike Fährmann
885bd4cbe2
[readcomiconline] extract comic metadata
2017-09-18 19:18:24 +02:00
Mike Fährmann
92a11528d1
smaller changes
2017-06-28 09:42:49 +02:00
Mike Fährmann
f226417420
simplify code by using a MangaExtractor base class
2017-05-20 11:27:43 +02:00
Mike Fährmann
f537ad5f2f
[kissmanga] re-enable module
2017-04-05 12:16:23 +02:00