Commit Graph

372 Commits

Author SHA1 Message Date
Mike Fährmann
9bfde2f535 [twitter] simplify URL patterns with USER_PATTERN 2025-08-22 19:41:16 +02:00
Mike Fährmann
ff94f1dec5 [twitter:avatar] fix "KeyError: 'profile_image_url_https'" (#8087)
fixes regression introduced in 5747dbf00c
2025-08-21 05:58:33 +02:00
Mike Fährmann
a8b334e866 [twitter] add 'home' extractor (#7974) 2025-08-19 23:03:24 +02:00
Mike Fährmann
47150f3e8a [twitter] add 'highlights' extractor (#7826) 2025-08-19 09:14:39 +02:00
Mike Fährmann
8252980264 [twitter] extract 'community' metadata (#7424)
update default download directories and archive IDs
for community extractors
2025-08-19 08:56:04 +02:00
Mike Fährmann
5747dbf00c [twitter] update API endpoint query hashes & parameters 2025-08-18 21:50:10 +02:00
Mike Fährmann
c1abcb99de [twitter] handle "KeyError: 'result'" for retweets (#8072) 2025-08-18 10:18:03 +02:00
Mike Fährmann
3b93184997 [twitter] fix potential 'UnboundLocalError' (#7932)
this happens with Tweets containing both images and video
when 'videos' are disabled.
2025-07-30 16:45:48 +02:00
Mike Fährmann
a097a373a9 simplify if statements by using walrus operators (#7671) 2025-07-22 20:57:54 +02:00
Mike Fährmann
d8ef1d693f rename 'StopExtraction' to 'AbortExtraction'
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
Mike Fährmann
cfafbc0675 [twitter] extract 'sensitive_flags' metadata (#2523)
a list of 'sensitive_media_warning' flags per file
and a combination of all file flags per Tweet
2025-07-09 12:39:23 +02:00
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
e2d104a110 [twitter] extract 'source_id' and 'source_user' metadata (#7470 #7640) 2025-06-12 18:59:22 +02:00
Mike Fährmann
06e2f2cd91 [twitter] restructure media data extraction 2025-06-12 18:53:15 +02:00
Mike Fährmann
8dace96af3 [twitter] simplify 'expand' & 'unique' init code 2025-06-05 15:33:47 +02:00
Mike Fährmann
e199396872 [common] simplify 'user' extractors by using 'Dispatch' mixin 2025-05-24 18:04:53 +02:00
Mike Fährmann
b97dc456b0 [twitter] import 'transaction_id' only when needed 2025-05-04 07:42:44 +02:00
Mike Fährmann
edc67983ed [twitter] update 'x-csrf-token' header after ct init (#7467) 2025-05-03 12:55:31 +02:00
Mike Fährmann
771317b36c [twitter:ctid] cache client transaction keys (#7382)
and 'ondemand.s.…a.js' responses
2025-05-03 12:50:00 +02:00
Mike Fährmann
e0913c95b2 [twitter] generate 'x-client-transaction-id' header values (#7382)
TODO: cache ClientTransaction state on disk
2025-05-02 12:10:05 +02:00
stephanelsmith
f0e7992674 [twitter] added 'followers' extractor
modeled after the 'following' extractor

- cleanup
- add test
2025-04-19 18:24:29 +02:00
Mike Fährmann
2798fb8a80 [twitter] update API endpoint query hashes (#7382 #7386)
and associated 'variables', 'features', and 'fieldToggles' parameters
2025-04-19 16:45:47 +02:00
Mike Fährmann
a859abf6a1 [twitter] prevent exception in '_extract_components()' (#7139) 2025-03-09 10:15:18 +01:00
Mike Fährmann
d2cad599f7 [twitter] support 'grok' cards content (#7040) 2025-02-25 20:47:31 +01:00
Mike Fährmann
64dc655ed6 [twitter] revert generated CSRF token length to 32 characters (#6895)
revert d9c4fcc7fa
2025-01-30 19:16:10 +01:00
Mike Fährmann
cb1a75eefc [twitter] handle errors during file extraction (#6647) 2025-01-21 18:23:54 +01:00
Mike Fährmann
d9c4fcc7fa [twitter] generate longer CSRF token values 2025-01-21 18:19:25 +01:00
Mike Fährmann
cfe24a9e31 [twitter] make 'source' metadata extraction non-fatal (#6472) 2024-11-14 18:59:01 +01:00
Mike Fährmann
e3fbd6825b [twitter] remove cookies migration workaround
revert 141efc2ad3
2024-10-31 17:10:13 +01:00
Mike Fährmann
a120295632 [util] use minimal separators for 'json_dumps()' 2024-10-01 17:03:13 +02:00
Mike Fährmann
bd932b6860 [twitter] add 'info' as a possible 'include' value (#6114) 2024-08-31 17:04:22 +02:00
Mike Fährmann
ef8b1bc56e [twitter] extract 'type' metadata (#6111) 2024-08-31 13:16:51 +02:00
Mike Fährmann
c51938b82b [twitter] fix pinned Tweet extraction (#6102) 2024-08-29 08:53:48 +02:00
Mike Fährmann
c0668f5106 [twitter] allow disabling 'cursor' output (#5990) 2024-08-17 19:24:38 +02:00
Mike Fährmann
21734ab69e [twitter] update 'x-csrf-token' header during login (#5945) 2024-08-07 08:11:44 +02:00
Mike Fährmann
c83c812a1e [instagram][twitter] rename 'profile' to 'info' (#5262, #3623) 2024-07-11 00:22:39 +02:00
Mike Fährmann
8e747b6dee [twitter] send initial 'cursor' only when given via option 2024-07-09 20:42:06 +02:00
Mike Fährmann
ff39d28ef4 [twitter] fix 'user' when providing 'cursor' (#5833) 2024-07-08 23:16:47 +02:00
Mike Fährmann
65385e09cb [twitter] remove break (#5831) 2024-07-07 01:08:30 +02:00
Mike Fährmann
97a50a23d2 [twitter] implement 'cursor' support (#5753) 2024-07-05 00:03:02 +02:00
Mike Fährmann
29eb535e18 merge #5802: [twitter] extract 'bookmark_count' and 'view_count' 2024-06-28 23:21:57 +02:00
rameerez
ee93dfeb8d Add bookmark_count and view_count to the X / Twitter extractor
- Parse `view_count` as integer in Twitter extractor
- Style fix
- optimize 'view_count' extraction
2024-06-28 23:17:11 +02:00
Mike Fährmann
c2bf4ff99c [twitter] add 'profile' extractor (#3623) 2024-06-28 22:55:48 +02:00
Mike Fährmann
0bb8831853 [twitter] fix 'username-alt' option name (#5715) 2024-06-22 17:49:58 +02:00
Mike Fährmann
f58b0e6fc7 [twitter] ignore 'Unavailable' media (#5736)
… including geo-restricted content.

add 'unavailable' option to allow re-enabling them again
2024-06-21 00:15:10 +02:00
Mike Fährmann
c699ce8ebb [twitter] add 'username-alt' option (#5715) 2024-06-13 00:54:43 +02:00
Mike Fährmann
aa9be75d44 [twitter] fix duplicate ArkoseLogin check
forgot to replace this in 9e5d65fbf3
2024-06-06 19:44:42 +02:00
Mike Fährmann
162d4269ec [twitter] extend 'ratelimit' option (#5532)
allow waiting for a set amount of seconds
2024-06-06 01:18:08 +02:00