Commit Graph

20 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
cd931e1139 update extractor test results 2022-12-08 18:58:29 +01:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
86cbf485ab [webtoons] extract real episode number (#2591)
The number from the 'episode_no' query parameter
got renamed to 'episode_no'.
2022-05-17 22:33:29 +02:00
Kyle Anthony Williams
a14b72be21 [webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net (#2005)
* [webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net

This trick to avoid having to set a Referer header comes from
Webtoon's RSS feeds. The two URLs below are equivalent in content:

https://webtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
https://swebtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90

The URL with the domain "webtoon-phinf.pstatic.net" needs a Referer
header, and the domain "swebtoon-phinf.pstatic.net" does not. This
is because of the environment "swebtoon" images live in, one without
explicit network control: RSS feeds on sites such as Feedly. This change should
make it easier for gallery-dl developers to embed Webtoon comics without
worrying about headers.
2021-11-11 20:03:34 +01:00
Mike Fährmann
8bdeb2a6dd [webtoons] match arbitrary language codes (closes #1643) 2021-06-21 19:25:28 +02:00
Mike Fährmann
d88e34f17e [webtoons] use GalleryExtractor 2021-04-18 20:28:31 +02:00
Mike Fährmann
c4210b5371 [webtoons] update agegate/GDPR cookies 2021-04-18 20:28:31 +02:00
Christian Paul
41fbc20020 [webtoons]: Add cookie rstagGDPR_DE=true (#1431) 2021-04-07 21:42:55 +02:00
Mike Fährmann
2919d78bfc update extractor test results 2021-02-14 15:37:39 +01:00
Mike Fährmann
193dca2ce1 update extractor test results 2021-01-21 21:35:42 +01:00
Mike Fährmann
912eea29bc update extractor test results 2020-12-27 17:41:08 +01:00
Mike Fährmann
47114339a2 [webtoons] update 'ageGate' cookie 2020-12-07 14:56:32 +01:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
bb882b8cdb improve output of '-K' for parent extractors (#825) 2020-06-14 21:39:21 +02:00
Mike Fährmann
998d1d3a5c [webtoons] generalize and improve comic extraction (fixes #820) 2020-06-10 21:44:42 +02:00
Leonardo Taccari
bcac31b7c7 [webtoons] make archive_fmt unique (#779)
close #778
2020-05-25 21:23:54 +02:00
Mike Fährmann
0378d079a5 [webtoons] fixes and simplifications (#593, #761)
- fix episode listings for french comics
- allow input URLs without explicit scheme
- add 'lang'/'language' metadata
- use str.format() instead of '+' to assemble URLs
2020-05-18 20:20:03 +02:00
Leonardo Taccari
39cd389679 [webtoons] Add a new extractor for webtoons.com (#761)
The webtoons extractor can extract episode and entire comic (all
episodes) from webtoons.com.

All the logic of the extractors should be trivial except for a couple
of kludges needed:

 - `ageGatePass' cookie is always set to avoid possible redirect and stop of
    extraction, especially in the comic extractor
 - The image URLs returned by the episode extractor could not be fetched
   directly and the `Referer:' HTTP header needs to be passed to fetch them

Close #593.
2020-05-18 19:04:20 +02:00