Commit Graph

68 Commits

Author SHA1 Message Date
Mike Fährmann
53cdfaac37 [common] add reference to 'exception' module to Extractor class
- remove 'exception' imports
- replace with 'self.exc'
2026-02-15 10:57:22 +01:00
Mike Fährmann
e006d26c8e Revert "use f-strings when building 'pattern'"
revert d7c97d5a97.
2025-12-20 22:07:37 +01:00
Mike Fährmann
d497523461 [mastodon] fix "AttributeError: 'parse_datetime_iso'" (#8709)
fixes regression introduced in c8fc790028
2025-12-15 08:39:20 +01:00
Mike Fährmann
968597a302 yield 3-tuples for Message.Directory
adapt tuples to the same length and semantics as other messages
2025-12-05 21:39:52 +01:00
Mike Fährmann
d7c97d5a97 use f-strings when building 'pattern' 2025-10-20 21:23:11 +02:00
Mike Fährmann
6c71b279b6 [dt] update 'parse_datetime' calls with one argument 2025-10-17 22:49:41 +02:00
Mike Fährmann
085616e0a8 [dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()' 2025-10-17 17:43:06 +02:00
Mike Fährmann
a097a373a9 simplify if statements by using walrus operators (#7671) 2025-07-22 20:57:54 +02:00
Mike Fährmann
d8ef1d693f rename 'StopExtraction' to 'AbortExtraction'
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
b0580aba86 update 'match.lastindex' usage 2025-06-18 20:24:13 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
cd01eacd3d [mastodon] support Akkoma/Pleroma '/objects/:uuid' URLs (#7497) 2025-05-10 18:01:45 +02:00
Soblow "Opale" Xaselgio
a94672bede [mastodon] Add support for Akkoma/Pleroma /notice/:status_id urls
Signed-off-by: Soblow "Opale" Xaselgio <113846014+Soblow@users.noreply.github.com>
2025-05-09 12:03:57 +02:00
Mike Fährmann
7ccf64596e [mastodon] support '/statuses' URLs (#7255)
- /statuses/123456789
- /users/USER/statuses/123456789
2025-03-27 18:32:04 +01:00
Mike Fährmann
7916c8bf77 allow passing cookies to OAuth extractors
partially revert ce54b8c04c
2024-11-09 18:06:27 +01:00
Mike Fährmann
3cf5366143 [mastodon] add support for card images 2024-05-01 16:00:07 +02:00
Mike Fährmann
9b1995dda3 [mastodon] add 'favorite', 'list', and 'hashtag' extractors (#5529) 2024-05-01 15:59:34 +02:00
cenodis
3ba5fd9efd [mastodon] Use boolean instead of integer keys for accounts/statuses endpoint 2024-04-26 22:51:56 +02:00
blankie
225d849139 [mastodon] fix handling null 'moved' account field 2024-03-12 11:44:25 +11:00
Mike Fährmann
89066844f4 add 'config_instance' method
to allow for a more streamlined access to BaseExtractor instance options
2024-01-18 03:20:36 +01:00
Mike Fährmann
57fc6fcf83 replace '24*3600' with '86400'
and generalize cache maxage values
2023-12-18 23:57:22 +01:00
Mike Fährmann
3f9c113d78 [mastodon] Support non-numeric status IDs (#4936) 2023-12-16 01:52:31 +01:00
Mike Fährmann
4288cea94a [mastodon] fix reblogs (#4580) 2023-11-11 00:34:49 +01:00
Mike Fährmann
24a1d46391 [mastodon] support '/@USER/following' URLs
Previously, only '/users/USER/following' got matched.
2023-09-13 23:42:51 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
9c29c904c7 [mastodon] try to get account IDs without access token
Try to query the public '/api/v1/accounts/lookup' endpoint
and fall back to '/v1/accounts/search' if it returns an error.

'/api/v1/accounts/lookup' is available since Mastodon v3.4.0.
The version of an instance can be found at '/api/v1/instance'.
2023-04-13 14:03:23 +02:00
Mike Fährmann
8993b10751 [mastodon] add 'num' and 'count' metadata fields (#3517) 2023-01-23 13:10:11 +01:00
Mike Fährmann
e30e8aeef7 [mastodon] rename '_check_move' -> '_check_moved' 2023-01-14 14:46:24 +01:00
Allen
9fc142d27b [mastodon] add "remote_instance" field (#3119)
Example Usage:
If the url is "mastodon:https://mastodon.example.org/@VoteChess@botsin.space the "remote_instance" will be "botsin.space"
...
"directory": ["mastodon", "{remote_instance|instance}", "{account[username]!l}"]
...
2022-11-02 17:09:38 +01:00
Mike Fährmann
5c31791b3c [mastodon] support '/web/' URLs (#3109) 2022-10-28 11:47:00 +02:00
Mike Fährmann
9a2cfd4421 [mastodon] support cross-instance user references (#3109) 2022-10-27 14:26:42 +02:00
Mike Fährmann
58d97188b4 [mastodon] add 'bookmark' extractor (#3109) 2022-10-26 21:28:50 +02:00
Mike Fährmann
2787c8511a [mastodon] warn about moved accounts (#2939) 2022-09-20 17:57:14 +02:00
Marius Kaufmann
0aa8345a13 [mastodon] allow downloading without access token (#2782)
Most mastodon instances allow accessing /api/v1/accounts/XXXX/statuses and api/v1/statuses/XXXX without an API access token.
This commit allows users to download at least some links from such a mastodon instance that does not already have access tokens hard-coded into the extractor.
User extractor only works on links that include the user id such as https://mastodon.tld/@id:12345. Status links work as-is.
2022-07-27 12:07:06 +02:00
Mike Fährmann
d26da3b9e5 add pre-generated 'pattern' for supported BaseExtractor sites 2022-05-09 22:20:09 +02:00
Mike Fährmann
9377543162 [mastodon] add 'following' extractor (#1891) 2021-09-26 00:12:34 +02:00
Mike Fährmann
2c2932973c [mastodon] support specifying accounts by ID
Same as a3b473bd for Twitter

Instead of just
https://instance.tld/@user

it is now also possible to refer to that account with
https://instance.tld/users/user
https://instance.tld/@id:12345
https://instance.tld/users/id:12345
2021-09-25 20:28:16 +02:00
Mike Fährmann
312a28e78a [mastodon] add 'replies' option (#1669) 2021-07-07 00:59:02 +02:00
Mike Fährmann
513c491cea [mastodon] reset 'params' after first pagination iteration
otherwise query parameters in 'params' get specified twice the second
time around - once from the 'links["next"]' URL and once from 'params'
itself.
2021-07-07 00:07:18 +02:00
Mike Fährmann
a1f5b78039 [mastodon] add 'reblogs' option (#1669) 2021-07-06 23:27:32 +02:00
Mike Fährmann
93d356712c [mastodon] implement 'text-posts' option (#1569)
similar to Twitter's 'text-tweets'
2021-07-02 22:12:41 +02:00
Mike Fährmann
1d145a6186 [mastodon] use cache for OAuth tokens (#616) 2021-01-31 01:38:23 +01:00
Mike Fährmann
fa33f13453 [mastodon] update
- inherit from BaseExtractor
- remove custom generate_extractors() and config()
- improve layout of MastodonAPI internals
2021-01-27 23:49:01 +01:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
c51fbd72ba update extractor test results 2020-07-13 22:57:48 +02:00
Mike Fährmann
53cc498d9c improve config lookup when there are multiple possible locations
This specifically applies to all Mastodon extractors and all
extractors with a 'basecategory', i.e. 'booru', 'foolslide', etc.

Values inside those general config locations wouldn't be recognized
when a value with the same was set on the 'extractor' level.

For example 'extractor.mastodon.directory' should be used over
'extractor.directory' when both are set, but this was impossible
with the previous implementation.

(fixes #843)
2020-06-21 00:07:10 +02:00