Mike Fährmann
9dbe33b6de
replace old %-formatted and .format(…) strings with f-strings ( #7671 )
...
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a
'match.group(N)' -> 'match[N]' ( #7671 )
...
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
62d6f5f8d2
[luscious] fix IndexError for files without thumbnail ( #5122 )
2024-01-28 01:43:29 +01:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba
consistent cookie-related names
...
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
da11fb32d0
update extractor test results
2022-08-28 00:16:12 +02:00
Mike Fährmann
dee0d22561
update extractor test results
2022-02-06 21:39:24 +01:00
Mike Fährmann
211de95dd0
update extractor test results
2021-11-01 02:58:53 +01:00
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
ed4b3c48cb
fix flake8 and other tests
2021-08-12 16:05:26 +02:00
Nyasume
fa6af46756
Added ability to download GIFs instead of mp4 from Luscious and Reactor ( #1701 )
2021-08-12 15:12:42 +02:00
Mike Fährmann
bdfcc9c4b1
update extractor test results
2021-04-18 20:28:15 +02:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
fd438f0d78
update extractor test results
2020-04-11 23:00:42 +02:00
Mike Fährmann
762c758af4
[hiperdex] fix extraction
2020-04-03 21:25:25 +02:00
Mike Fährmann
4e361b3008
add tests for specific datetime values
2020-02-23 16:48:30 +01:00
Mike Fährmann
82f7f4172a
update test results
2020-01-01 16:05:38 +01:00
Mike Fährmann
4325695d74
[luscious] expand GraphQL queries
2019-11-04 21:17:22 +01:00
Mike Fährmann
4409d00141
embed error messages in StopExtraction exceptions
2019-10-28 16:39:49 +01:00
Mike Fährmann
6e08ada4fe
[luscious] simplify some metadata entries
2019-10-25 13:14:59 +02:00
Mike Fährmann
b23c822b23
[luscious] use GraphQL
2019-10-22 21:17:08 +02:00
Mike Fährmann
d92802fd37
[luscious] fix detection of unavailable galleries
2019-09-15 21:16:25 +02:00
Mike Fährmann
c50d60a53d
[reactor] fix image URLs
2019-08-16 14:07:22 +02:00
Mike Fährmann
4a0c98bfc9
miscellaneous fixes and adjustments
2019-08-01 22:09:43 +02:00
Mike Fährmann
40637556fa
[ngomik] fix extraction
2019-07-28 10:53:46 +02:00
Mike Fährmann
7a14aaed7d
[luscious] fix extraction
2019-05-17 10:48:47 +02:00
Mike Fährmann
aa8e366b90
[luscious] fix tag extraction
2019-05-14 17:35:52 +02:00
Mike Fährmann
f2cf1c1d73
use 'text.extract_from()' in a few places
2019-04-21 15:19:20 +02:00
Mike Fährmann
e25ebc4bff
don't disable certificate checks anymore
...
Executables generated with PyInstaller auto-include the root certificate
file and certificate checks now work out-of-the-box.
2019-04-17 13:27:19 +02:00
Mike Fährmann
2ff043edfa
[yaplog] add user- and post-extractors ( #190 )
2019-04-04 17:56:56 +02:00
Mike Fährmann
00d604cafb
[luscious] fix SearchExtractor URL-pattern
2019-03-29 15:58:08 +01:00
Mike Fährmann
1384ebf907
[luscious] fix metadata extraction
...
- remove 'artist', 'language', and 'lang' fields
- replace 'section' with 'genre'
- provide 'tags' as list
- use GalleryExtractor as base class
2019-03-29 13:06:02 +01:00
Mike Fährmann
d0f88c35be
[komikcast] fix extraction
2019-03-18 11:12:19 +01:00
Mike Fährmann
a2af2d2965
adjust cache maxage values
2019-03-14 22:21:49 +01:00
Mike Fährmann
e687a6095e
[luscious] raise exception if album is not available
2019-02-19 13:30:39 +01:00
Mike Fährmann
61741d7333
provide type information for Queue messages
...
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
2019-02-12 21:32:32 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
00dc37ccbf
replace AsynchronousMixin Extractor with a Mixin
2019-02-04 14:21:19 +01:00
Mike Fährmann
dd358b4564
improve cookie handling during logins
2019-01-30 17:09:32 +01:00
Mike Fährmann
0c32dc5858
[hentaifox] add extractor for search results ( #160 )
2019-01-28 22:38:32 +01:00
Mike Fährmann
e4171d6baf
[luscious] add login capabilities ( closes #159 )
2019-01-28 17:14:15 +01:00
Mike Fährmann
c9ef5ed364
[luscious] ensure URLs have a scheme
2018-12-21 17:56:51 +01:00
Mike Fährmann
a4263fb253
[luscious] add extractor for search results ( closes #127 )
2018-11-25 18:57:51 +01:00
Mike Fährmann
e1d306cc48
update unit test results
2018-10-13 16:54:30 +02:00
Mike Fährmann
38d4f43cc0
[komikcast] skip ads
2018-08-14 11:17:59 +02:00
Mike Fährmann
df7e18399e
[luscious] fix image order
2018-04-17 17:32:21 +02:00
Mike Fährmann
759ba26fb0
[luscious] proper image order for picture albums
...
... and (try) to start with the first image instead of somewhere
in the middle of an album.
2018-04-05 18:12:01 +02:00
Mike Fährmann
557cb94f81
[deviantart] use proper exponential backoff on API errors
...
... and use separate API credentials for unit tests.
2018-03-15 16:01:42 +01:00