Commit Graph

28 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
d900edfcfb [simplyhentai] fix extraction 2021-04-25 18:51:43 +02:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
f317a57c5e [simplyhentai] fix 'gallery_id' extraction 2020-07-27 16:14:06 +02:00
Mike Fährmann
7499d71d02 [simplyhentai] ignore certificate errors in video test 2020-03-28 21:07:30 +01:00
Mike Fährmann
87a87bff7e [simplyhentai] fix image URLs 2019-10-28 21:11:06 +01:00
Mike Fährmann
ef17d94469 update test results 2019-10-21 21:53:21 +02:00
Mike Fährmann
1693d97bd3 update extractor class hierarchies
- let the GalleryExtractor class inherit directly from Extractor
- make ChapterExtractor a subclass of GalleryExtractor
- change enumeration field names of GalleryExtractors to 'num'
2019-10-16 18:15:29 +02:00
Mike Fährmann
11ea689013 [simplyhentai] fix image and video URLs 2019-09-16 21:37:16 +02:00
Mike Fährmann
b1cddce865 Revert "[simplyhentai] fix extraction; remove image+video extractors"
This reverts commit d1db5180ab.
2019-09-07 14:48:31 +02:00
Mike Fährmann
d1db5180ab [simplyhentai] fix extraction; remove image+video extractors 2019-08-22 23:56:41 +02:00
Mike Fährmann
12da6bd0c9 [simplyhentai] fix/improve extraction 2019-07-06 20:25:53 +02:00
Mike Fährmann
26c4365baa adjust metadata types for GalleryExtractors 2019-03-02 14:53:04 +01:00
Mike Fährmann
3595cd582f use GalleryExtractor as common base class 2019-03-01 14:13:16 +01:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
2e516a1e3e store the full original URL in Extractor.url 2019-02-12 18:46:48 +01:00
Mike Fährmann
580baef72c change Chapter and MangaExtractor classes
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
02d733d219 [simplyhentai] fix and improve tag extraction
The "tags" field is now a list instead of a string.
In format strings, use "{tags:J, }" to Join them.
2019-02-10 13:52:09 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
8e01cf0ef8 [reactor] generalize extractors (#148)
- support *.reactor.cc domains
- combine joyreactor and pornreactor modules
2019-01-07 17:06:47 +01:00
Mike Fährmann
a47c6136cd [simplyhentai] avoid redirects for all-pages.json (#89) 2018-06-01 22:06:34 +02:00
Mike Fährmann
72e66f0aac [simplyhentai] improve URL pattern
[ci skip]
2018-05-30 11:44:43 +02:00
Mike Fährmann
cdcc3427a0 [simplyhentai] add video extractor (#89)
All videos hosted on their own servers seem be to dead,
but myhentai.tv embeds, which are most of the videos, work fine.
2018-05-30 11:25:23 +02:00
Mike Fährmann
f9a6a19658 [simplyhentai] add image extractor (#89) 2018-05-30 10:58:48 +02:00
Mike Fährmann
55b0913412 [simplyhentai] add gallery extractor (#89) 2018-05-27 15:25:04 +02:00