Commit Graph

69 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
da11fb32d0 update extractor test results 2022-08-28 00:16:12 +02:00
Mike Fährmann
dee0d22561 update extractor test results 2022-02-06 21:39:24 +01:00
Mike Fährmann
211de95dd0 update extractor test results 2021-11-01 02:58:53 +01:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
ed4b3c48cb fix flake8 and other tests 2021-08-12 16:05:26 +02:00
Nyasume
fa6af46756 Added ability to download GIFs instead of mp4 from Luscious and Reactor (#1701) 2021-08-12 15:12:42 +02:00
Mike Fährmann
bdfcc9c4b1 update extractor test results 2021-04-18 20:28:15 +02:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
fd438f0d78 update extractor test results 2020-04-11 23:00:42 +02:00
Mike Fährmann
762c758af4 [hiperdex] fix extraction 2020-04-03 21:25:25 +02:00
Mike Fährmann
4e361b3008 add tests for specific datetime values 2020-02-23 16:48:30 +01:00
Mike Fährmann
82f7f4172a update test results 2020-01-01 16:05:38 +01:00
Mike Fährmann
4325695d74 [luscious] expand GraphQL queries 2019-11-04 21:17:22 +01:00
Mike Fährmann
4409d00141 embed error messages in StopExtraction exceptions 2019-10-28 16:39:49 +01:00
Mike Fährmann
6e08ada4fe [luscious] simplify some metadata entries 2019-10-25 13:14:59 +02:00
Mike Fährmann
b23c822b23 [luscious] use GraphQL 2019-10-22 21:17:08 +02:00
Mike Fährmann
d92802fd37 [luscious] fix detection of unavailable galleries 2019-09-15 21:16:25 +02:00
Mike Fährmann
c50d60a53d [reactor] fix image URLs 2019-08-16 14:07:22 +02:00
Mike Fährmann
4a0c98bfc9 miscellaneous fixes and adjustments 2019-08-01 22:09:43 +02:00
Mike Fährmann
40637556fa [ngomik] fix extraction 2019-07-28 10:53:46 +02:00
Mike Fährmann
7a14aaed7d [luscious] fix extraction 2019-05-17 10:48:47 +02:00
Mike Fährmann
aa8e366b90 [luscious] fix tag extraction 2019-05-14 17:35:52 +02:00
Mike Fährmann
f2cf1c1d73 use 'text.extract_from()' in a few places 2019-04-21 15:19:20 +02:00
Mike Fährmann
e25ebc4bff don't disable certificate checks anymore
Executables generated with PyInstaller auto-include the root certificate
file and certificate checks now work out-of-the-box.
2019-04-17 13:27:19 +02:00
Mike Fährmann
2ff043edfa [yaplog] add user- and post-extractors (#190) 2019-04-04 17:56:56 +02:00
Mike Fährmann
00d604cafb [luscious] fix SearchExtractor URL-pattern 2019-03-29 15:58:08 +01:00
Mike Fährmann
1384ebf907 [luscious] fix metadata extraction
- remove 'artist', 'language', and 'lang' fields
- replace 'section' with 'genre'
- provide 'tags' as list
- use GalleryExtractor as base class
2019-03-29 13:06:02 +01:00
Mike Fährmann
d0f88c35be [komikcast] fix extraction 2019-03-18 11:12:19 +01:00
Mike Fährmann
a2af2d2965 adjust cache maxage values 2019-03-14 22:21:49 +01:00
Mike Fährmann
e687a6095e [luscious] raise exception if album is not available 2019-02-19 13:30:39 +01:00
Mike Fährmann
61741d7333 provide type information for Queue messages
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
2019-02-12 21:32:32 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
00dc37ccbf replace AsynchronousMixin Extractor with a Mixin 2019-02-04 14:21:19 +01:00
Mike Fährmann
dd358b4564 improve cookie handling during logins 2019-01-30 17:09:32 +01:00
Mike Fährmann
0c32dc5858 [hentaifox] add extractor for search results (#160) 2019-01-28 22:38:32 +01:00
Mike Fährmann
e4171d6baf [luscious] add login capabilities (closes #159) 2019-01-28 17:14:15 +01:00
Mike Fährmann
c9ef5ed364 [luscious] ensure URLs have a scheme 2018-12-21 17:56:51 +01:00
Mike Fährmann
a4263fb253 [luscious] add extractor for search results (closes #127) 2018-11-25 18:57:51 +01:00
Mike Fährmann
e1d306cc48 update unit test results 2018-10-13 16:54:30 +02:00
Mike Fährmann
38d4f43cc0 [komikcast] skip ads 2018-08-14 11:17:59 +02:00
Mike Fährmann
df7e18399e [luscious] fix image order 2018-04-17 17:32:21 +02:00
Mike Fährmann
759ba26fb0 [luscious] proper image order for picture albums
... and (try) to start with the first image instead of somewhere
in the middle of an album.
2018-04-05 18:12:01 +02:00
Mike Fährmann
557cb94f81 [deviantart] use proper exponential backoff on API errors
... and use separate API credentials for unit tests.
2018-03-15 16:01:42 +01:00
Mike Fährmann
3cec533c28 Merge branch 'archive' 2018-02-12 18:07:58 +01:00
Mike Fährmann
34873dbd90 set 'archive_fmt' values
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
a34cebc253 [luscious] jump to first image if cover does not link to it 2018-01-30 22:39:01 +01:00
Mike Fährmann
263741d243 [luscious] update URL pattern (closes #55) 2017-12-14 14:15:01 +01:00