Commit Graph

286 Commits

Author SHA1 Message Date
Mike Fährmann
efaab4fbfa [twitter] fix crash due to missing 'source' (#4620)
regression caused by 06aaedde
2023-10-04 23:01:04 +02:00
Mike Fährmann
6178177227 [twitter] fix '_extractor' of following results (#4536)
regression from 20ed647f
2023-09-15 23:04:30 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
4c0b3d5dc5 [twitter] fix crash when 'sortIndex' is None (#4499) 2023-09-04 18:28:43 +02:00
Mike Fährmann
06aaedded5 [twitter] extract 'source' metadata (#4459) 2023-08-28 16:31:57 +02:00
Mike Fährmann
e0829ff0fd [twitter] add 'date_original' metadata for retweets (#4337, #4443) 2023-08-23 23:58:11 +02:00
Mike Fährmann
2b88ad19e9 [twitter] accept 'x.com' URLs (#4452) 2023-08-21 19:47:07 +02:00
Mike Fährmann
089d1a4f67 [twitter] fix 'TweetWithVisibilityResults' (#4369) 2023-08-06 22:08:50 +02:00
Mike Fährmann
fb3f0453db [twitter] improve error messages for single Tweets (#4369)
also fixes '"quoted": false' not having any effect
2023-08-03 22:02:07 +02:00
Mike Fährmann
7fbc304ae9 [twitter] fix crash on private user (#4349) 2023-07-26 17:53:51 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
90231f2d5a [twitter] add 'tweet-endpoint' option (#4307)
use the newer TweetResultByRestId only for guests by default
2023-07-18 17:19:32 +02:00
Mike Fährmann
20ed647f6f [twitter] add 'user' extractor and 'include' option (#4275) 2023-07-18 16:42:55 +02:00
Mike Fährmann
86be197d11 [twitter] remove '/search/adaptive.json' 2023-07-18 15:45:37 +02:00
Mike Fährmann
0b08e2e8a8 merge #4287: [twitter] Fix following extractor not getting all users 2023-07-10 14:41:00 +02:00
Mike Fährmann
f6553ffd2f [twitter] simplify '_pagination_users'
- remove 'stop' variable
- call 'cursor.startswith()' only once
2023-07-10 14:39:09 +02:00
Mike Fährmann
a27dbe8c82 [twitter] use 'TweetResultByRestId' endpoint (#4250)
allows accessing single Tweets without login
2023-07-08 23:17:10 +02:00
Mike Fährmann
d3d639a159 [twitter] don't treat missing 'TimelineAddEntries' as fatal (#4278) 2023-07-08 22:49:34 +02:00
ActuallyKit
c321c773f2 make the code less ugly 2023-07-09 02:52:04 +07:00
ActuallyKit
a437a34bcf fix lint i guess? 2023-07-09 02:41:46 +07:00
ActuallyKit
6cbc434b54 Fix users pagination 2023-07-09 02:28:35 +07:00
Mike Fährmann
1bf9f52c99 [twitter] add 'ratelimit' option (#4251) 2023-07-04 18:17:32 +02:00
Mike Fährmann
f86fdf64a6 [twitter] use GraphQL search by default (#4264) 2023-07-04 17:55:22 +02:00
Mike Fährmann
c1cce4a80b [twitter] extend 'conversations' option (#4211) 2023-06-24 21:34:34 +02:00
Mike Fährmann
54cf1fa3e7 [twitter] use GraphQL search endpoint (#3942)
for guest users; selectable with 'search-endpoint' option.

adapted from 9c7b888ffa
2023-06-01 21:37:31 +02:00
Mike Fährmann
864a654b25 [twitter] update query hashes 2023-06-01 21:37:31 +02:00
Mike Fährmann
45cc7cee1a [twitter] better error message for guest searches (#3942) 2023-06-01 21:37:11 +02:00
Mike Fährmann
271f23d971 [twitter] extract 'conversation_id' metadata (#3839) 2023-06-01 15:31:52 +02:00
Mike Fährmann
d0184fddcf [twitter] optimize '_extract_twitpic()'
- use findall instead of finditer
- store URLs in a dict to discard duplicates
2023-05-25 15:18:49 +02:00
Mike Fährmann
3dc862c7fc merge #3796: [twitter] extract TwitPic URLs in text (#3792) 2023-05-25 14:59:07 +02:00
Mike Fährmann
1d505b39f8 [twitter] support 'profile-conversation' entries (#3938) 2023-04-21 15:08:50 +02:00
Mike Fährmann
f500b45b5e [twitter] improve 480bc34e
only check for double user assignment where necessary
2023-04-18 20:50:23 +02:00
Mike Fährmann
480bc34e54 [twitter] do not overwrite previously assigned users (#3922) 2023-04-16 17:30:43 +02:00
Mike Fährmann
f5a59c4170 [twitter] add 'date_bookmarked' metadata (#3816) 2023-04-06 20:16:25 +02:00
Mike Fährmann
1c1f6fdc80 [twitter] fix regression from 160335ad
Tweets from 'homeConversation' or 'conversationthread' entries do not
contain a 'sortIndex' field. Accessing it raises a KeyError and would
erroneously get them labeled as 'deleted'.
2023-04-06 19:22:48 +02:00
Mike Fährmann
160335ad44 [twitter] add 'date_liked' metadata for liked Tweets (#3816) 2023-04-06 18:33:45 +02:00
Mike Fährmann
6d850ce629 [twitter] calculate 'date' from Tweet IDs
20 times faster than parsing 'created_at'
2023-04-05 22:29:14 +02:00
Mike Fährmann
dbe06cdba1 [twitter] warn about 'withheld' Tweets and users (#3864) 2023-04-04 16:15:08 +02:00
Mike Fährmann
3cc1dd1572 [twitter] update query hashes 2023-04-03 23:20:20 +02:00
Mike Fährmann
3846ce0de5 [twitter] update to bookmark timeline v2 (#3859) 2023-04-03 22:46:12 +02:00
Mike Fährmann
e6cb92864a [twitter] allow setting custom features per API endpoint 2023-04-03 16:18:31 +02:00
Amer Jazaerli
bebbff6578 fix: graphql_timeline_v2_bookmark_timeline cannot be null
twitter: 400 Bad Request (The following features cannot be null: graphql_timeline_v2_bookmark_timeline)
2023-03-31 00:06:49 +02:00
Mike Fährmann
197882cf12 [twitter] add 'hashtag' extractor (#3783) 2023-03-22 22:20:40 +01:00
ClosedPort22
d4fb4ff47f [twitter] extract TwitPic URLs in text (#3792)
also ignore previously seen URLs
2023-03-18 21:19:24 +08:00
Mike Fährmann
2bb937014f [twitter] fall back to legacy /media endpoint when not logged in 2023-03-17 20:54:35 +01:00
Mike Fährmann
b68094d326 [twitter] support 'note_tweet's 2023-03-17 19:36:07 +01:00
Mike Fährmann
3dcabc97ed [twitter] update API endpoints and parameters 2023-03-17 19:25:53 +01:00
Mike Fährmann
9037128315 [twitter] fix some 'original' retweets not downloading (#3744) 2023-03-08 18:33:19 +01:00
Mike Fährmann
dd884b02ee replace json.loads with direct calls to JSONDecoder.decode 2023-02-09 15:22:00 +01:00