93 Commits

Author SHA1 Message Date
Mike Fährmann
53cdfaac37 [common] add reference to 'exception' module to Extractor class
- remove 'exception' imports
- replace with 'self.exc'
2026-02-15 10:57:22 +01:00
Mike Fährmann
00c6821a3f replace 2-element f-strings with simple '+' concatenations
Python's 'ast' module and its 'NodeVisitor' class
were incredibly helpful in identifying these
2025-12-22 11:26:04 +01:00
Mike Fährmann
e006d26c8e Revert "use f-strings when building 'pattern'"
revert d7c97d5a97.
2025-12-20 22:07:37 +01:00
Mike Fährmann
968597a302 yield 3-tuples for Message.Directory
adapt tuples to the same length and semantics as other messages
2025-12-05 21:39:52 +01:00
Mike Fährmann
d7c97d5a97 use f-strings when building 'pattern' 2025-10-20 21:23:11 +02:00
Mike Fährmann
6c71b279b6 [dt] update 'parse_datetime' calls with one argument 2025-10-17 22:49:41 +02:00
Mike Fährmann
085616e0a8 [dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()' 2025-10-17 17:43:06 +02:00
Mike Fährmann
f2a72d8d1e replace 'request(…).json()' with 'request_json(…)' 2025-06-29 17:50:19 +02:00
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
046ebb5590 [imgur] replace AuthorizationError exception with logging message 2025-02-15 15:36:39 +01:00
Mike Fährmann
7ae09c6b29 [imgur] add support for (hidden) personal posts (#6990)
https://imgur.com/user/me
https://imgur.com/user/me/hidden
2025-02-14 19:28:55 +01:00
Mike Fährmann
7f1ed909d5 [imgur] match gallery/album/image URLs with title slugs (#5593) 2024-05-17 22:44:37 +02:00
Mike Fährmann
5842e4928d [imgur] fail downloads when redirected to 'removed.png' (#5308) 2024-03-09 23:35:23 +01:00
Mike Fährmann
3ecb512722 send Referer headers by default 2023-09-19 00:02:04 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
1d2b5d0c60 update test comment positions
always put them above the test they're referring to
2023-09-06 18:16:09 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d12dd3813c [imgur] fix internal image/album URLs
URLs from "link" attributes of newer images/albums were all returned
as 'https://imgur.com/gallery/...' instead of the expected format,
causing them to be ignored.
2023-05-06 15:13:38 +02:00
Mike Fährmann
8520de57f0 [imgur] add 'favorite-folder' extractor (#4016) 2023-05-06 15:10:13 +02:00
Mike Fährmann
aaf58a1259 [imgur] document 'client-id' option (#3937) 2023-04-21 15:08:50 +02:00
ClosedPort22
bf1649dadb [imgur] add support for imgur.io URLs 2022-12-17 14:33:44 +08:00
Mike Fährmann
4598d32370 [imgur] prevent exception for empty albums (closes #2557) 2022-05-04 17:34:50 +02:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
4fc9668922 [imgur] update URL patterns (#1561) 2021-05-19 15:44:10 +02:00
Mike Fährmann
0b55f5ad84 [imgur] fix/improve rate limit handling (#1386)
- also wait-and-retry on 429 status codes
- use infinite loop instead of recursive calls
- 'extractor.sleep()' -> 'extractor.wait()'
2021-03-18 15:45:26 +01:00
Mike Fährmann
3df527ee2c update extractor test results 2021-02-27 21:01:29 +01:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
799ca07fc8 [imgur] update
- fix image/album detection for galleries
- use new API endpoints for image/album data
2020-09-06 21:11:32 +02:00
Mike Fährmann
ab1af66a97 [imgur] add 'search' extractor (#934) 2020-08-27 22:46:17 +02:00
Mike Fährmann
e4bbc1fb5c [imgur] add 'tag' extractor (#934) 2020-08-27 22:46:17 +02:00
Mike Fährmann
ec5870576d [imgur] handle 403 overcapacity responses (closes #910) 2020-07-30 19:26:01 +02:00
Mike Fährmann
27d163afb3 [imgur] support all '/t/...' URLs (closes #880)
… instead of just '/t/unmuted/'
2020-07-09 22:17:01 +02:00
Mike Fährmann
bd0e1ca1a5 [imgur] build directory path for each file (closes #842) 2020-06-21 19:25:52 +02:00
Mike Fährmann
6bcdb264e0 [imgur] treat 't/unmuted' URLs as galleries 2020-05-25 22:21:57 +02:00
Mike Fährmann
b6cee3e45b [imgur] fix extraction of animated images without 'mp4' entry 2020-05-25 22:21:57 +02:00
Mike Fährmann
4e361b3008 add tests for specific datetime values 2020-02-23 16:48:30 +01:00
Mike Fährmann
32d7195d08 [pinterest] improve detection of invalid pin.it links 2020-01-18 21:06:44 +01:00
Mike Fährmann
1f2a69f3c5 add '_extractor' information to redirect results 2019-12-29 23:37:34 +01:00
Mike Fährmann
6e23c0da09 [imgur] add extractor for subreddit links (closes #500) 2019-12-02 23:44:13 +01:00
Mike Fährmann
e9aed62c91 [imgur] unescape image titles 2019-11-28 22:13:24 +01:00
Mike Fährmann
b0197098e6 [imgur] get title from webpage if missing in API response
(closes #467)
2019-11-07 21:10:04 +01:00
Mike Fährmann
8f38a35b91 [imgur] use API with "public" client_id (#446)
Using the API endpoints makes it possible to access NSFW content
without logging in.
2019-10-23 21:43:55 +02:00
Mike Fährmann
7ebd984e8d [imgur] print error message if no JSON data is found (#446) 2019-10-16 17:45:14 +02:00
Mike Fährmann
5882b00f2f [imgur] implement login support (#446) 2019-10-15 22:00:22 +02:00
Mike Fährmann
913460240d [reddit] fix 'extractor.blacklist()' arguments
The second argument must support 'append()'.
2019-09-24 23:01:12 +02:00
Mike Fährmann
4330133114 [imgur] add 'favorite' extractor (closes #420)
… and use a newer site-internal API endpoint for user posts
2019-09-19 15:54:26 +02:00
Mike Fährmann
d780f0357e [imgur] add user extractor 2019-09-17 22:58:18 +02:00
Mike Fährmann
7d6af936c5 [imgur] simplify gallery extraction 2019-08-20 20:00:43 +02:00