Commit Graph

75 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d12dd3813c [imgur] fix internal image/album URLs
URLs from "link" attributes of newer images/albums were all returned
as 'https://imgur.com/gallery/...' instead of the expected format,
causing them to be ignored.
2023-05-06 15:13:38 +02:00
Mike Fährmann
8520de57f0 [imgur] add 'favorite-folder' extractor (#4016) 2023-05-06 15:10:13 +02:00
Mike Fährmann
aaf58a1259 [imgur] document 'client-id' option (#3937) 2023-04-21 15:08:50 +02:00
ClosedPort22
bf1649dadb [imgur] add support for imgur.io URLs 2022-12-17 14:33:44 +08:00
Mike Fährmann
4598d32370 [imgur] prevent exception for empty albums (closes #2557) 2022-05-04 17:34:50 +02:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
4fc9668922 [imgur] update URL patterns (#1561) 2021-05-19 15:44:10 +02:00
Mike Fährmann
0b55f5ad84 [imgur] fix/improve rate limit handling (#1386)
- also wait-and-retry on 429 status codes
- use infinite loop instead of recursive calls
- 'extractor.sleep()' -> 'extractor.wait()'
2021-03-18 15:45:26 +01:00
Mike Fährmann
3df527ee2c update extractor test results 2021-02-27 21:01:29 +01:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
799ca07fc8 [imgur] update
- fix image/album detection for galleries
- use new API endpoints for image/album data
2020-09-06 21:11:32 +02:00
Mike Fährmann
ab1af66a97 [imgur] add 'search' extractor (#934) 2020-08-27 22:46:17 +02:00
Mike Fährmann
e4bbc1fb5c [imgur] add 'tag' extractor (#934) 2020-08-27 22:46:17 +02:00
Mike Fährmann
ec5870576d [imgur] handle 403 overcapacity responses (closes #910) 2020-07-30 19:26:01 +02:00
Mike Fährmann
27d163afb3 [imgur] support all '/t/...' URLs (closes #880)
… instead of just '/t/unmuted/'
2020-07-09 22:17:01 +02:00
Mike Fährmann
bd0e1ca1a5 [imgur] build directory path for each file (closes #842) 2020-06-21 19:25:52 +02:00
Mike Fährmann
6bcdb264e0 [imgur] treat 't/unmuted' URLs as galleries 2020-05-25 22:21:57 +02:00
Mike Fährmann
b6cee3e45b [imgur] fix extraction of animated images without 'mp4' entry 2020-05-25 22:21:57 +02:00
Mike Fährmann
4e361b3008 add tests for specific datetime values 2020-02-23 16:48:30 +01:00
Mike Fährmann
32d7195d08 [pinterest] improve detection of invalid pin.it links 2020-01-18 21:06:44 +01:00
Mike Fährmann
1f2a69f3c5 add '_extractor' information to redirect results 2019-12-29 23:37:34 +01:00
Mike Fährmann
6e23c0da09 [imgur] add extractor for subreddit links (closes #500) 2019-12-02 23:44:13 +01:00
Mike Fährmann
e9aed62c91 [imgur] unescape image titles 2019-11-28 22:13:24 +01:00
Mike Fährmann
b0197098e6 [imgur] get title from webpage if missing in API response
(closes #467)
2019-11-07 21:10:04 +01:00
Mike Fährmann
8f38a35b91 [imgur] use API with "public" client_id (#446)
Using the API endpoints makes it possible to access NSFW content
without logging in.
2019-10-23 21:43:55 +02:00
Mike Fährmann
7ebd984e8d [imgur] print error message if no JSON data is found (#446) 2019-10-16 17:45:14 +02:00
Mike Fährmann
5882b00f2f [imgur] implement login support (#446) 2019-10-15 22:00:22 +02:00
Mike Fährmann
913460240d [reddit] fix 'extractor.blacklist()' arguments
The second argument must support 'append()'.
2019-09-24 23:01:12 +02:00
Mike Fährmann
4330133114 [imgur] add 'favorite' extractor (closes #420)
… and use a newer site-internal API endpoint for user posts
2019-09-19 15:54:26 +02:00
Mike Fährmann
d780f0357e [imgur] add user extractor 2019-09-17 22:58:18 +02:00
Mike Fährmann
7d6af936c5 [imgur] simplify gallery extraction 2019-08-20 20:00:43 +02:00
Mike Fährmann
829b1ccf04 [imgur] distinguish album and gallery URLs (#380)
A gallery can be either an album or a single image.
2019-08-14 21:40:14 +02:00
Mike Fährmann
fdec59f8e2 replace extractor.request() 'expect' argument
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
2019-07-05 00:42:16 +02:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
34bab080ae rewrite URL patterns to use only 1 per extractor 2019-02-08 12:03:10 +01:00
Mike Fährmann
ff436692bf ["deviantart] add 'journals' option 2018-07-16 18:14:41 +02:00
Mike Fährmann
017188d268 improve extractor.request()
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
2018-06-18 16:29:56 +02:00
Mike Fährmann
ad14de19c6 [imgur] support "unmuted" URLs 2018-05-30 16:19:01 +02:00
Mike Fährmann
4cea886177 [imgur] allow longer album hashes 2018-05-13 11:21:51 +02:00
Mike Fährmann
1b80fa82a9 [imgur] update URL pattern and tests 2018-04-08 21:06:21 +02:00
Mike Fährmann
179bcdd349 adjust archive-ids 2018-02-13 04:50:45 +01:00
Mike Fährmann
3cec533c28 Merge branch 'archive' 2018-02-12 18:07:58 +01:00
Mike Fährmann
20af86b2ea add more extractor tests
for mangastream, reddit and imgur
2018-02-12 17:07:18 +01:00
Mike Fährmann
7e0207bcf4 [imgur] strip trailing '?1' from 'ext' 2018-02-10 21:33:40 +01:00
Mike Fährmann
34873dbd90 set 'archive_fmt' values
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
76509a6d3c [imgur] update test results 2018-01-20 18:49:29 +01:00
Mike Fährmann
82ea6c0cd3 adjust format strings with optional titles
... except for anything manga/comic related
2017-09-28 18:00:19 +02:00
H R X N
77bf923c56 Update imgur.py to include 'title' of single image (#40)
Add {title} keyword..
Images on Imgur don't necessarily have a title, but I think most of them do, and since this should not break anything else..
2017-09-26 12:48:48 +02:00