Commit Graph

389 Commits

Author SHA1 Message Date
Mike Fährmann
d7c97d5a97 use f-strings when building 'pattern' 2025-10-20 21:23:11 +02:00
Mike Fährmann
9bf76c1352 replace 'util.re()' with 'text.re()'
remove unnecessary 'util' imports
2025-10-20 17:44:58 +02:00
Mike Fährmann
c8fc790028 merge branch 'dt': move datetime utils into separate module
- use 'datetime.fromisoformat()' when possible (#7671)
- return a datetime-compatible object for invalid datetimes
  (instead of a 'str' value)
2025-10-20 09:30:05 +02:00
Mike Fährmann
238d0973f7 [twitter] fix "KeyError - 'source_id'" with disabled 'transform' (#8429) 2025-10-18 19:39:24 +02:00
Mike Fährmann
085616e0a8 [dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()' 2025-10-17 17:43:06 +02:00
Mike Fährmann
e42030a3a6 [twitter] fix 'KeyError' for "temporarily unavailable" users (#8423) 2025-10-16 15:50:48 +02:00
Mike Fährmann
8c62be343e [output] add 'Logger.traceback()' helper 2025-10-14 18:44:29 +02:00
Mike Fährmann
c1d21e8cb9 [twitter] remove login support (#4202 #6029 #6040 #8362)
broken feature
2025-10-07 08:32:40 +02:00
24xyz
92be341711 [twitter] fix 'quote_id' of individual Tweets (#8284)
Fix 'quoted_by_id_str' to use parent tweet id
2025-09-24 19:50:12 +02:00
Mike Fährmann
5aa2124736 [twitter] fix all quoted Tweets being marked as 'deleted' (#8225)
due to "KeyError: 'screen_name'"
when trying to access the author's name

fixes regression introduced in 5747dbf00c
2025-09-16 10:08:32 +02:00
Mike Fährmann
05128ccf49 [twitter] add 'search-limit' option (#8173)
reduce default limit from 100 to 20
2025-09-13 10:30:58 +02:00
Mike Fährmann
f6fcba4040 [twitter] add 'search-stop' option (#8173)
and rename 'pagination-search' to 'search-pagination'
2025-09-09 10:14:43 +02:00
Mike Fährmann
d182749f45 [twitter] implement 'pagination-search' option (#8173) 2025-09-07 21:03:29 +02:00
Mike Fährmann
f94eedbe1d [twitter] continue searches on empty response (#8173)
stop when receiving more than 3 empty responses in a row
2025-09-07 17:42:56 +02:00
Mike Fährmann
52c932add6 [twitter] prevent "KeyError: 'name'" in '_transform_user()' (#8154)
fixes regression introduced in 5747dbf00c
2025-09-01 20:52:05 +02:00
Mike Fährmann
8650a6bf39 [twitter] fix "KeyError: 'core'" when processing communities (#8141)
fixes regression introduced in 8252980264
2025-08-29 19:42:37 +02:00
Mike Fährmann
d251996d8e [twitter] prevent exceptions in '_transform_community()' (#8134)
fixes regression introduced in 8252980264
2025-08-28 11:24:45 +02:00
Mike Fährmann
9bfde2f535 [twitter] simplify URL patterns with USER_PATTERN 2025-08-22 19:41:16 +02:00
Mike Fährmann
ff94f1dec5 [twitter:avatar] fix "KeyError: 'profile_image_url_https'" (#8087)
fixes regression introduced in 5747dbf00c
2025-08-21 05:58:33 +02:00
Mike Fährmann
a8b334e866 [twitter] add 'home' extractor (#7974) 2025-08-19 23:03:24 +02:00
Mike Fährmann
47150f3e8a [twitter] add 'highlights' extractor (#7826) 2025-08-19 09:14:39 +02:00
Mike Fährmann
8252980264 [twitter] extract 'community' metadata (#7424)
update default download directories and archive IDs
for community extractors
2025-08-19 08:56:04 +02:00
Mike Fährmann
5747dbf00c [twitter] update API endpoint query hashes & parameters 2025-08-18 21:50:10 +02:00
Mike Fährmann
c1abcb99de [twitter] handle "KeyError: 'result'" for retweets (#8072) 2025-08-18 10:18:03 +02:00
Mike Fährmann
3b93184997 [twitter] fix potential 'UnboundLocalError' (#7932)
this happens with Tweets containing both images and video
when 'videos' are disabled.
2025-07-30 16:45:48 +02:00
Mike Fährmann
a097a373a9 simplify if statements by using walrus operators (#7671) 2025-07-22 20:57:54 +02:00
Mike Fährmann
d8ef1d693f rename 'StopExtraction' to 'AbortExtraction'
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
Mike Fährmann
cfafbc0675 [twitter] extract 'sensitive_flags' metadata (#2523)
a list of 'sensitive_media_warning' flags per file
and a combination of all file flags per Tweet
2025-07-09 12:39:23 +02:00
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
e2d104a110 [twitter] extract 'source_id' and 'source_user' metadata (#7470 #7640) 2025-06-12 18:59:22 +02:00
Mike Fährmann
06e2f2cd91 [twitter] restructure media data extraction 2025-06-12 18:53:15 +02:00
Mike Fährmann
8dace96af3 [twitter] simplify 'expand' & 'unique' init code 2025-06-05 15:33:47 +02:00
Mike Fährmann
e199396872 [common] simplify 'user' extractors by using 'Dispatch' mixin 2025-05-24 18:04:53 +02:00
Mike Fährmann
b97dc456b0 [twitter] import 'transaction_id' only when needed 2025-05-04 07:42:44 +02:00
Mike Fährmann
edc67983ed [twitter] update 'x-csrf-token' header after ct init (#7467) 2025-05-03 12:55:31 +02:00
Mike Fährmann
771317b36c [twitter:ctid] cache client transaction keys (#7382)
and 'ondemand.s.…a.js' responses
2025-05-03 12:50:00 +02:00
Mike Fährmann
e0913c95b2 [twitter] generate 'x-client-transaction-id' header values (#7382)
TODO: cache ClientTransaction state on disk
2025-05-02 12:10:05 +02:00
stephanelsmith
f0e7992674 [twitter] added 'followers' extractor
modeled after the 'following' extractor

- cleanup
- add test
2025-04-19 18:24:29 +02:00
Mike Fährmann
2798fb8a80 [twitter] update API endpoint query hashes (#7382 #7386)
and associated 'variables', 'features', and 'fieldToggles' parameters
2025-04-19 16:45:47 +02:00
Mike Fährmann
a859abf6a1 [twitter] prevent exception in '_extract_components()' (#7139) 2025-03-09 10:15:18 +01:00
Mike Fährmann
d2cad599f7 [twitter] support 'grok' cards content (#7040) 2025-02-25 20:47:31 +01:00
Mike Fährmann
64dc655ed6 [twitter] revert generated CSRF token length to 32 characters (#6895)
revert d9c4fcc7fa
2025-01-30 19:16:10 +01:00
Mike Fährmann
cb1a75eefc [twitter] handle errors during file extraction (#6647) 2025-01-21 18:23:54 +01:00
Mike Fährmann
d9c4fcc7fa [twitter] generate longer CSRF token values 2025-01-21 18:19:25 +01:00
Mike Fährmann
cfe24a9e31 [twitter] make 'source' metadata extraction non-fatal (#6472) 2024-11-14 18:59:01 +01:00
Mike Fährmann
e3fbd6825b [twitter] remove cookies migration workaround
revert 141efc2ad3
2024-10-31 17:10:13 +01:00
Mike Fährmann
a120295632 [util] use minimal separators for 'json_dumps()' 2024-10-01 17:03:13 +02:00
Mike Fährmann
bd932b6860 [twitter] add 'info' as a possible 'include' value (#6114) 2024-08-31 17:04:22 +02:00