Commit Graph

73 Commits

Author SHA1 Message Date
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
62d6f5f8d2 [luscious] fix IndexError for files without thumbnail (#5122) 2024-01-28 01:43:29 +01:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
da11fb32d0 update extractor test results 2022-08-28 00:16:12 +02:00
Mike Fährmann
dee0d22561 update extractor test results 2022-02-06 21:39:24 +01:00
Mike Fährmann
211de95dd0 update extractor test results 2021-11-01 02:58:53 +01:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
ed4b3c48cb fix flake8 and other tests 2021-08-12 16:05:26 +02:00
Nyasume
fa6af46756 Added ability to download GIFs instead of mp4 from Luscious and Reactor (#1701) 2021-08-12 15:12:42 +02:00
Mike Fährmann
bdfcc9c4b1 update extractor test results 2021-04-18 20:28:15 +02:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
fd438f0d78 update extractor test results 2020-04-11 23:00:42 +02:00
Mike Fährmann
762c758af4 [hiperdex] fix extraction 2020-04-03 21:25:25 +02:00
Mike Fährmann
4e361b3008 add tests for specific datetime values 2020-02-23 16:48:30 +01:00
Mike Fährmann
82f7f4172a update test results 2020-01-01 16:05:38 +01:00
Mike Fährmann
4325695d74 [luscious] expand GraphQL queries 2019-11-04 21:17:22 +01:00
Mike Fährmann
4409d00141 embed error messages in StopExtraction exceptions 2019-10-28 16:39:49 +01:00
Mike Fährmann
6e08ada4fe [luscious] simplify some metadata entries 2019-10-25 13:14:59 +02:00
Mike Fährmann
b23c822b23 [luscious] use GraphQL 2019-10-22 21:17:08 +02:00
Mike Fährmann
d92802fd37 [luscious] fix detection of unavailable galleries 2019-09-15 21:16:25 +02:00
Mike Fährmann
c50d60a53d [reactor] fix image URLs 2019-08-16 14:07:22 +02:00
Mike Fährmann
4a0c98bfc9 miscellaneous fixes and adjustments 2019-08-01 22:09:43 +02:00
Mike Fährmann
40637556fa [ngomik] fix extraction 2019-07-28 10:53:46 +02:00
Mike Fährmann
7a14aaed7d [luscious] fix extraction 2019-05-17 10:48:47 +02:00
Mike Fährmann
aa8e366b90 [luscious] fix tag extraction 2019-05-14 17:35:52 +02:00
Mike Fährmann
f2cf1c1d73 use 'text.extract_from()' in a few places 2019-04-21 15:19:20 +02:00
Mike Fährmann
e25ebc4bff don't disable certificate checks anymore
Executables generated with PyInstaller auto-include the root certificate
file and certificate checks now work out-of-the-box.
2019-04-17 13:27:19 +02:00
Mike Fährmann
2ff043edfa [yaplog] add user- and post-extractors (#190) 2019-04-04 17:56:56 +02:00
Mike Fährmann
00d604cafb [luscious] fix SearchExtractor URL-pattern 2019-03-29 15:58:08 +01:00
Mike Fährmann
1384ebf907 [luscious] fix metadata extraction
- remove 'artist', 'language', and 'lang' fields
- replace 'section' with 'genre'
- provide 'tags' as list
- use GalleryExtractor as base class
2019-03-29 13:06:02 +01:00
Mike Fährmann
d0f88c35be [komikcast] fix extraction 2019-03-18 11:12:19 +01:00
Mike Fährmann
a2af2d2965 adjust cache maxage values 2019-03-14 22:21:49 +01:00
Mike Fährmann
e687a6095e [luscious] raise exception if album is not available 2019-02-19 13:30:39 +01:00
Mike Fährmann
61741d7333 provide type information for Queue messages
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
2019-02-12 21:32:32 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
00dc37ccbf replace AsynchronousMixin Extractor with a Mixin 2019-02-04 14:21:19 +01:00
Mike Fährmann
dd358b4564 improve cookie handling during logins 2019-01-30 17:09:32 +01:00
Mike Fährmann
0c32dc5858 [hentaifox] add extractor for search results (#160) 2019-01-28 22:38:32 +01:00
Mike Fährmann
e4171d6baf [luscious] add login capabilities (closes #159) 2019-01-28 17:14:15 +01:00
Mike Fährmann
c9ef5ed364 [luscious] ensure URLs have a scheme 2018-12-21 17:56:51 +01:00
Mike Fährmann
a4263fb253 [luscious] add extractor for search results (closes #127) 2018-11-25 18:57:51 +01:00
Mike Fährmann
e1d306cc48 update unit test results 2018-10-13 16:54:30 +02:00
Mike Fährmann
38d4f43cc0 [komikcast] skip ads 2018-08-14 11:17:59 +02:00
Mike Fährmann
df7e18399e [luscious] fix image order 2018-04-17 17:32:21 +02:00
Mike Fährmann
759ba26fb0 [luscious] proper image order for picture albums
... and (try) to start with the first image instead of somewhere
in the middle of an album.
2018-04-05 18:12:01 +02:00
Mike Fährmann
557cb94f81 [deviantart] use proper exponential backoff on API errors
... and use separate API credentials for unit tests.
2018-03-15 16:01:42 +01:00