Commit Graph

41 Commits

Author SHA1 Message Date
Mike Fährmann
d7c97d5a97 use f-strings when building 'pattern' 2025-10-20 21:23:11 +02:00
Mike Fährmann
43448f7089 [webtoons] fix 'thumbnail' extraction (#8413) 2025-10-15 10:06:10 +02:00
Mike Fährmann
b137d8e5d3 [webtoons] fix 'episode' metadata extraction (#2591)
https://github.com/mikf/gallery-dl/issues/2591#issuecomment-3386463898
2025-10-09 18:34:59 +02:00
Mike Fährmann
d8ef1d693f rename 'StopExtraction' to 'AbortExtraction'
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
Mike Fährmann
26e81e4162 [common] rename 'gallery_url'/'manga_url' to 'page_url 2025-06-26 22:06:57 +02:00
Mike Fährmann
8a93616a2d [webtoons] add 'banners' option (#6468) 2025-06-26 19:29:52 +02:00
Mike Fährmann
3c6a5657ea [webtoons] update code 2025-06-26 15:24:37 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
pocketinternet
3ea244eebb [webtoons] add 'thumbnails' option (#6468 #7441)
* Update webtoons.py
    Added thumbnail download capability which defaults to false
* Update configuration.rst
    Added documentation for webtoon thumbnail option
* extract thumbnails in GalleryExtractor.assets()
* simplify & fix flake8
* include 'type' in default filenames
* add test
* update docs

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2025-06-17 19:34:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
4916b4fd1f [webtoons] download JPEG files in better quality
add 'quality' option
2025-04-10 22:04:43 +02:00
Mike Fährmann
09d42b8e89 [webtoons] use a default delay pf 0.5-1.5s between requests (#7329) 2025-04-09 20:41:22 +02:00
Mike Fährmann
015ba76c9c [webtoons] add 'artist' extractor (#7274) 2025-04-01 10:06:56 +02:00
Mike Fährmann
fb6afb1ee1 [webtoons] update & simplify code 2025-03-31 11:49:02 +02:00
Mike Fährmann
65863239a0 [webtoons] fix 'username' and 'author_name' extraction 2025-01-27 12:05:40 +01:00
Mike Fährmann
b6cf348658 [webtoons] extract 'episode_no' for comic results (#6439) 2024-11-08 14:19:17 +01:00
blankie
df718887c2 [webtoons] fix extracting comic and episode name with commas 2024-01-21 09:50:27 +11:00
Mike Fährmann
8ffa0cd3c8 [webtoons] small optimization
don't extract the entire 'author_area' and
avoid creating a second 'text.extract_from()' object
2024-01-15 18:24:47 +01:00
blankie
bb446b1598 [webtoons] extract more metadata 2024-01-14 19:26:49 +11:00
Mike Fährmann
c8c744a7c0 [webtoons] fix pagination when receiving an HTTP redirect 2023-11-24 22:17:34 +01:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
cd931e1139 update extractor test results 2022-12-08 18:58:29 +01:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
86cbf485ab [webtoons] extract real episode number (#2591)
The number from the 'episode_no' query parameter
got renamed to 'episode_no'.
2022-05-17 22:33:29 +02:00
Kyle Anthony Williams
a14b72be21 [webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net (#2005)
* [webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net

This trick to avoid having to set a Referer header comes from
Webtoon's RSS feeds. The two URLs below are equivalent in content:

https://webtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
https://swebtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90

The URL with the domain "webtoon-phinf.pstatic.net" needs a Referer
header, and the domain "swebtoon-phinf.pstatic.net" does not. This
is because of the environment "swebtoon" images live in, one without
explicit network control: RSS feeds on sites such as Feedly. This change should
make it easier for gallery-dl developers to embed Webtoon comics without
worrying about headers.
2021-11-11 20:03:34 +01:00
Mike Fährmann
8bdeb2a6dd [webtoons] match arbitrary language codes (closes #1643) 2021-06-21 19:25:28 +02:00
Mike Fährmann
d88e34f17e [webtoons] use GalleryExtractor 2021-04-18 20:28:31 +02:00
Mike Fährmann
c4210b5371 [webtoons] update agegate/GDPR cookies 2021-04-18 20:28:31 +02:00
Christian Paul
41fbc20020 [webtoons]: Add cookie rstagGDPR_DE=true (#1431) 2021-04-07 21:42:55 +02:00
Mike Fährmann
2919d78bfc update extractor test results 2021-02-14 15:37:39 +01:00
Mike Fährmann
193dca2ce1 update extractor test results 2021-01-21 21:35:42 +01:00
Mike Fährmann
912eea29bc update extractor test results 2020-12-27 17:41:08 +01:00
Mike Fährmann
47114339a2 [webtoons] update 'ageGate' cookie 2020-12-07 14:56:32 +01:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
bb882b8cdb improve output of '-K' for parent extractors (#825) 2020-06-14 21:39:21 +02:00
Mike Fährmann
998d1d3a5c [webtoons] generalize and improve comic extraction (fixes #820) 2020-06-10 21:44:42 +02:00
Leonardo Taccari
bcac31b7c7 [webtoons] make archive_fmt unique (#779)
close #778
2020-05-25 21:23:54 +02:00
Mike Fährmann
0378d079a5 [webtoons] fixes and simplifications (#593, #761)
- fix episode listings for french comics
- allow input URLs without explicit scheme
- add 'lang'/'language' metadata
- use str.format() instead of '+' to assemble URLs
2020-05-18 20:20:03 +02:00
Leonardo Taccari
39cd389679 [webtoons] Add a new extractor for webtoons.com (#761)
The webtoons extractor can extract episode and entire comic (all
episodes) from webtoons.com.

All the logic of the extractors should be trivial except for a couple
of kludges needed:

 - `ageGatePass' cookie is always set to avoid possible redirect and stop of
    extraction, especially in the comic extractor
 - The image URLs returned by the episode extractor could not be fetched
   directly and the `Referer:' HTTP header needs to be passed to fetch them

Close #593.
2020-05-18 19:04:20 +02:00