Commit Graph

152 Commits

Author SHA1 Message Date
Mike Fährmann
53cdfaac37 [common] add reference to 'exception' module to Extractor class
- remove 'exception' imports
- replace with 'self.exc'
2026-02-15 10:57:22 +01:00
Mike Fährmann
ec2267244f [tumblr:search] prevent KeyError when using 'offset' pagination (#8720) 2025-12-29 16:57:04 +01:00
Mike Fährmann
00c6821a3f replace 2-element f-strings with simple '+' concatenations
Python's 'ast' module and its 'NodeVisitor' class
were incredibly helpful in identifying these
2025-12-22 11:26:04 +01:00
Mike Fährmann
e006d26c8e Revert "use f-strings when building 'pattern'"
revert d7c97d5a97.
2025-12-20 22:07:37 +01:00
Mike Fährmann
968597a302 yield 3-tuples for Message.Directory
adapt tuples to the same length and semantics as other messages
2025-12-05 21:39:52 +01:00
Mike Fährmann
d7c97d5a97 use f-strings when building 'pattern' 2025-10-20 21:23:11 +02:00
Mike Fährmann
9bf76c1352 replace 'util.re()' with 'text.re()'
remove unnecessary 'util' imports
2025-10-20 17:44:58 +02:00
Mike Fährmann
085616e0a8 [dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()' 2025-10-17 17:43:06 +02:00
Mike Fährmann
69f7cfdd0c [dt] replace 'datetime' imports 2025-10-16 11:42:42 +02:00
Mike Fährmann
951bf7c6b9 [tumblr] update
- provide 'search_tags' metadata for tag searches (#8160)
- support '/archive/tagged/' URLs (#8160)
- use self.groups
- remove __init__ constructors & _init functions
- remove "#category" test results
2025-09-02 10:26:53 +02:00
Mike Fährmann
ff147c2a32 [tumblr] fix pagination when using 'date-max' 2025-09-02 10:24:00 +02:00
Mike Fährmann
d9d8172364 [tumblr:search] fix 'ValueError: not enough values to unpack' (#8079)
fixes regression introduced in 21160a8b08
2025-08-20 08:45:19 +02:00
Mike Fährmann
ca22cb1487 [tumblr] add 'following' & 'followers' extractors (#8018) 2025-08-12 22:11:10 +02:00
Mike Fährmann
a097a373a9 simplify if statements by using walrus operators (#7671) 2025-07-22 20:57:54 +02:00
Mike Fährmann
d8ef1d693f rename 'StopExtraction' to 'AbortExtraction'
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
811b665e33 remove @staticmethod decorators
There might have been a time when calling a static method was faster
than a regular method, but that is no longer the case. According to
micro-benchmarks, it is 70% slower in CPython 3.13 and it also makes
executing the code of a class definition slower.
2025-06-12 22:50:52 +02:00
Mike Fährmann
b5c88b3d3e replace standard library 're' uses with 'util.re()' 2025-06-06 13:24:52 +02:00
Mike Fährmann
a1fd329783 [tumblr] improve error message for dashboard-only blogs (#7455) 2025-05-03 11:02:38 +02:00
Mike Fährmann
21160a8b08 [tumblr] support URLs without subdomain (#7358) 2025-04-13 09:33:51 +02:00
Mike Fährmann
7916c8bf77 allow passing cookies to OAuth extractors
partially revert ce54b8c04c
2024-11-09 18:06:27 +01:00
Mike Fährmann
33778d35ba [tumblr] update
- simplify
- fix search pagination
- support custom search mode and post types
2024-11-08 08:15:13 +01:00
Allen
0f94fa9015 [tumblr] search extractor minimal styling changes 2024-10-29 13:06:23 +01:00
Allen
d2ef9a590f [tumblr] add search extractor 2024-09-03 08:18:58 +02:00
Mike Fährmann
785e6f2911 [tumblr] fix 401 Unauthorized for likes when using api-key (#5994)
fixes regression introduced in 540eaa5a
2024-08-12 09:09:59 +02:00
Mike Fährmann
540eaa5add [tumblr] implement 'pagination' option (#5880)
restore pagination behavior from before
de670bd7de
2024-07-23 20:31:04 +02:00
Mike Fährmann
141a93c8fd [docs] update docs/configuration links (#5059, #5369, #5423) 2024-04-13 02:18:44 +02:00
Mike Fährmann
da76e13e3b [tumblr] fix exception after waiting for rate limit (#4916)
use a loop instead of recursive function calls
2023-12-12 19:14:06 +01:00
Mike Fährmann
d59d4ebff4 [tumblr] support infinite 'fallback-retries' 2023-12-11 23:40:13 +01:00
Mike Fährmann
7608201a44 [tumblr] fix 'day' extractor
another bug caused by a383eca7
2023-11-25 00:51:14 +01:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
1d2b5d0c60 update test comment positions
always put them above the test they're referring to
2023-09-06 18:16:09 +02:00
Mike Fährmann
255d08b79e add test for 'Extractor.initialize()' (#4359) 2023-07-28 16:58:16 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
5297ee0cd9 [tumblr] add 'day' extractor (#3951) 2023-04-24 22:01:47 +02:00
Mike Fährmann
de670bd7de [tumblr] update pagination logic (#2191) 2023-04-24 20:07:10 +02:00
Mike Fährmann
8fb043e8ff [tumblr] raise more detailed errors for dashboard-only blogs
(#3628)
2023-02-12 19:38:14 +01:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
ClosedPort22
4e80d3210e [tumblr] Fallback to gifv when possible (#3095) (#3159) 2022-11-04 19:42:36 +01:00
Mike Fährmann
7c6af27eb8 [tumblr] add 'fallback-*' options (#2957)
specifically 'fallback-delay' and 'fallback-retries'
and change default number of retries to 2 (down from 3)
2022-10-26 13:59:09 +02:00
Mike Fährmann
68466a7d61 [tumblr] support 'https://www.tumblr.com/BLOGNAME' URLs (#3034) 2022-10-11 21:09:24 +02:00
Mike Fährmann
f1f89b2436 [tumblr] add 'offset' option 2022-10-11 10:54:23 +02:00
Mike Fährmann
e5d229c524 [tumblr] sleep between fallback retries (#2957) 2022-10-11 10:48:28 +02:00
Mike Fährmann
e1d714943b [tumblr] catch exception when updating image token (#2957) 2022-09-30 15:08:21 +02:00
Mike Fährmann
f728b5ca06 [tumblr] add fallback for failed higher-resolution images (#2957) 2022-09-28 21:36:09 +02:00
Mike Fährmann
32c30754d1 [tumblr] warn when unable to fetch higher-resolution images (#2957)
and download the smaller version
instead of failing with a 404 error
2022-09-26 12:05:34 +02:00
Mike Fährmann
46fe469c53 [tumblr] implement 'ratelimit' option (#2919) 2022-09-17 14:10:33 +02:00