Commit Graph

47 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
c6a9bab019 update extractor test results 2022-07-12 15:49:22 +02:00
Mike Fährmann
c8abb16c60 [mangahere] send Referer headers (#2592) 2022-05-15 14:41:16 +02:00
Mike Fährmann
dee0d22561 update extractor test results 2022-02-06 21:39:24 +01:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
286718950c [mangahere] ensure download URLs have a scheme (fixes #1070) 2020-10-17 22:43:59 +02:00
Mike Fährmann
0b4cb8e57a [mangahere] send 'isAdult' cookie (fixes #556) 2020-01-04 21:25:35 +01:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
32edf4fc7b add '_extractor' info to manga extractor results 2019-02-13 13:23:36 +01:00
Mike Fährmann
580baef72c change Chapter and MangaExtractor classes
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
1f3422c28b [mangahere] fix extraction 2019-02-10 22:10:53 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
6126615698 update URLs for supportedsites.rst 2019-01-30 16:18:22 +01:00
Mike Fährmann
bb89a1e6d7 [mangahere] use http://
invalid SSL cert for quite some time now
2018-07-26 18:11:31 +02:00
Mike Fährmann
6996f5c118 [mangahere] fix and improve chapter extraction 2018-07-09 20:07:40 +02:00
Mike Fährmann
f3d770d4e2 Merge branch '1.4-dev' 2018-05-22 17:24:57 +02:00
Mike Fährmann
f43d446692 [mangahere] extract chapter titles 2018-05-16 16:22:05 +02:00
Mike Fährmann
95392554ee use text.urljoin() 2018-04-26 17:00:26 +02:00
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
d11fcf4804 smaller changes and fixes
- fix the cloudflare challenge result if the last decimal places
  are zero (JS`s toFixed() removes trailing zeroes)
- fix downloading of kissmanga chapter-pages hosted on blogspot
  (accessing blogspot with "kissmanga.com" as referrer yields a 401)
- disable certificate validation for 'mangahere' tests
- update flickr test result
2018-04-06 15:30:09 +02:00
Mike Fährmann
5b3c34aa96 use generic chapter-extractor in more modules 2018-02-07 12:36:39 +01:00
Mike Fährmann
8102aae311 [mangahere] support ".cc" TLD and mobile URLs 2017-12-20 21:34:25 +01:00
Mike Fährmann
305da540c3 [mangahere] fix metadata extraction 2017-11-03 14:54:46 +01:00
Mike Fährmann
633b376f35 improve/adjust default filename formats for manga sites 2017-10-02 19:06:24 +02:00
Mike Fährmann
1ab4c7986f [mangahere] fix extraction
would switch to HTTPS, but there seem to be certificate issues
2017-09-26 21:05:11 +02:00
Mike Fährmann
9fc1d0c901 implement and use 'util.safe_int()'
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
2017-09-24 15:59:25 +02:00
Mike Fährmann
d39b8779af [mangahere] extract manga metadata 2017-09-22 14:55:37 +02:00
Mike Fährmann
6f30cf4c64 change keyword names to valid Python identifiers
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.

(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
2017-09-10 22:20:47 +02:00
Mike Fährmann
7aa9fa796a code cleanup and fixes 2017-07-25 14:59:41 +02:00
Mike Fährmann
f226417420 simplify code by using a MangaExtractor base class 2017-05-20 11:27:43 +02:00
Mike Fährmann
94e10f249a code adjustments according to pep8 nr2 2017-02-01 00:53:19 +01:00
Mike Fährmann
4c55275305 update tests 2016-12-12 14:17:15 +01:00
Mike Fährmann
56d810c896 update keyword hashes for tests 2016-09-25 17:28:46 +02:00
Mike Fährmann
19c2d4ff6f remove explicit (sub)category keywords 2016-09-25 14:22:07 +02:00
Mike Fährmann
d7e168799d consistent extractor naming scheme + docstrings 2016-09-12 10:34:31 +02:00
Mike Fährmann
2faa7393b1 [mangahere] adjust for image domain 2016-08-02 14:35:12 +02:00
Mike Fährmann
0736fe29e2 [mangahere] fix parsing 2016-04-20 08:33:06 +02:00
Mike Fährmann
ba99506c72 more extractor test-cases 2015-12-14 03:00:58 +01:00
Mike Fährmann
f7c47a6018 add subcategories to extractors 2015-11-30 01:11:13 +01:00
Mike Fährmann
1497da07de remove unused format-strings 2015-11-29 23:41:43 +01:00
Mike Fährmann
f48712c9c9 docstrings 2015-11-28 22:21:35 +01:00
Mike Fährmann
914062d172 use text.extract_iter where applicable 2015-11-28 02:06:29 +01:00
Mike Fährmann
332d9e393b [mangahere] support sub-chapters (e.g. ch4.5) 2015-11-28 00:31:04 +01:00
Mike Fährmann
88739a3564 [mangahere] add manga-extractor 2015-11-28 00:11:28 +01:00
Mike Fährmann
d1673d912a [mangahere] add chapter-extractor 2015-11-26 03:06:08 +01:00