Commit Graph

276 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
90231f2d5a [twitter] add 'tweet-endpoint' option (#4307)
use the newer TweetResultByRestId only for guests by default
2023-07-18 17:19:32 +02:00
Mike Fährmann
20ed647f6f [twitter] add 'user' extractor and 'include' option (#4275) 2023-07-18 16:42:55 +02:00
Mike Fährmann
86be197d11 [twitter] remove '/search/adaptive.json' 2023-07-18 15:45:37 +02:00
Mike Fährmann
0b08e2e8a8 merge #4287: [twitter] Fix following extractor not getting all users 2023-07-10 14:41:00 +02:00
Mike Fährmann
f6553ffd2f [twitter] simplify '_pagination_users'
- remove 'stop' variable
- call 'cursor.startswith()' only once
2023-07-10 14:39:09 +02:00
Mike Fährmann
a27dbe8c82 [twitter] use 'TweetResultByRestId' endpoint (#4250)
allows accessing single Tweets without login
2023-07-08 23:17:10 +02:00
Mike Fährmann
d3d639a159 [twitter] don't treat missing 'TimelineAddEntries' as fatal (#4278) 2023-07-08 22:49:34 +02:00
ActuallyKit
c321c773f2 make the code less ugly 2023-07-09 02:52:04 +07:00
ActuallyKit
a437a34bcf fix lint i guess? 2023-07-09 02:41:46 +07:00
ActuallyKit
6cbc434b54 Fix users pagination 2023-07-09 02:28:35 +07:00
Mike Fährmann
1bf9f52c99 [twitter] add 'ratelimit' option (#4251) 2023-07-04 18:17:32 +02:00
Mike Fährmann
f86fdf64a6 [twitter] use GraphQL search by default (#4264) 2023-07-04 17:55:22 +02:00
Mike Fährmann
c1cce4a80b [twitter] extend 'conversations' option (#4211) 2023-06-24 21:34:34 +02:00
Mike Fährmann
54cf1fa3e7 [twitter] use GraphQL search endpoint (#3942)
for guest users; selectable with 'search-endpoint' option.

adapted from 9c7b888ffa
2023-06-01 21:37:31 +02:00
Mike Fährmann
864a654b25 [twitter] update query hashes 2023-06-01 21:37:31 +02:00
Mike Fährmann
45cc7cee1a [twitter] better error message for guest searches (#3942) 2023-06-01 21:37:11 +02:00
Mike Fährmann
271f23d971 [twitter] extract 'conversation_id' metadata (#3839) 2023-06-01 15:31:52 +02:00
Mike Fährmann
d0184fddcf [twitter] optimize '_extract_twitpic()'
- use findall instead of finditer
- store URLs in a dict to discard duplicates
2023-05-25 15:18:49 +02:00
Mike Fährmann
3dc862c7fc merge #3796: [twitter] extract TwitPic URLs in text (#3792) 2023-05-25 14:59:07 +02:00
Mike Fährmann
1d505b39f8 [twitter] support 'profile-conversation' entries (#3938) 2023-04-21 15:08:50 +02:00
Mike Fährmann
f500b45b5e [twitter] improve 480bc34e
only check for double user assignment where necessary
2023-04-18 20:50:23 +02:00
Mike Fährmann
480bc34e54 [twitter] do not overwrite previously assigned users (#3922) 2023-04-16 17:30:43 +02:00
Mike Fährmann
f5a59c4170 [twitter] add 'date_bookmarked' metadata (#3816) 2023-04-06 20:16:25 +02:00
Mike Fährmann
1c1f6fdc80 [twitter] fix regression from 160335ad
Tweets from 'homeConversation' or 'conversationthread' entries do not
contain a 'sortIndex' field. Accessing it raises a KeyError and would
erroneously get them labeled as 'deleted'.
2023-04-06 19:22:48 +02:00
Mike Fährmann
160335ad44 [twitter] add 'date_liked' metadata for liked Tweets (#3816) 2023-04-06 18:33:45 +02:00
Mike Fährmann
6d850ce629 [twitter] calculate 'date' from Tweet IDs
20 times faster than parsing 'created_at'
2023-04-05 22:29:14 +02:00
Mike Fährmann
dbe06cdba1 [twitter] warn about 'withheld' Tweets and users (#3864) 2023-04-04 16:15:08 +02:00
Mike Fährmann
3cc1dd1572 [twitter] update query hashes 2023-04-03 23:20:20 +02:00
Mike Fährmann
3846ce0de5 [twitter] update to bookmark timeline v2 (#3859) 2023-04-03 22:46:12 +02:00
Mike Fährmann
e6cb92864a [twitter] allow setting custom features per API endpoint 2023-04-03 16:18:31 +02:00
Amer Jazaerli
bebbff6578 fix: graphql_timeline_v2_bookmark_timeline cannot be null
twitter: 400 Bad Request (The following features cannot be null: graphql_timeline_v2_bookmark_timeline)
2023-03-31 00:06:49 +02:00
Mike Fährmann
197882cf12 [twitter] add 'hashtag' extractor (#3783) 2023-03-22 22:20:40 +01:00
ClosedPort22
d4fb4ff47f [twitter] extract TwitPic URLs in text (#3792)
also ignore previously seen URLs
2023-03-18 21:19:24 +08:00
Mike Fährmann
2bb937014f [twitter] fall back to legacy /media endpoint when not logged in 2023-03-17 20:54:35 +01:00
Mike Fährmann
b68094d326 [twitter] support 'note_tweet's 2023-03-17 19:36:07 +01:00
Mike Fährmann
3dcabc97ed [twitter] update API endpoints and parameters 2023-03-17 19:25:53 +01:00
Mike Fährmann
9037128315 [twitter] fix some 'original' retweets not downloading (#3744) 2023-03-08 18:33:19 +01:00
Mike Fährmann
dd884b02ee replace json.loads with direct calls to JSONDecoder.decode 2023-02-09 15:22:00 +01:00
Mike Fährmann
1ae48a54f8 [twitter] add 'transform' option 2023-02-02 22:01:36 +01:00
ClosedPort22
ab58c375b4 [twitter] fix search (#3536)
- partially revert 18fe4b334d
- properly search for cursor when processing 'replaceEntry'
2023-01-20 14:12:25 +08:00
Mike Fährmann
9683d79bb7 [twitter] "fix" search pagination (#3536, #3534)
- properly process instructions
- do not expect a predetermined instruction order
2023-01-16 14:58:30 +01:00
Mike Fährmann
4fec848858 [twitter] use "browser": "firefox" by default (#3522)
and reenable TLS 1.2 ciphers
2023-01-15 22:11:04 +01:00
Mike Fährmann
78937564fd [twitter] fix login after 32b03433 2023-01-15 22:10:21 +01:00
Mike Fährmann
32b0343334 [twitter] refresh guest tokens (#3445, #3458) 2023-01-13 22:19:25 +01:00
Mike Fährmann
26c3292538 [twitter] disable TLS 1.2 ciphers by default (#3522) 2023-01-13 16:05:43 +01:00
Mike Fährmann
18fe4b334d [twitter] remove 'tweet_search_mode' from search parameters (#3522)
and update API root and general query parameters
2023-01-13 15:50:46 +01:00
Mike Fährmann
2f31d21509 merge #3455: [twitter] apply tweet type checks before uniqueness check 2023-01-06 13:32:50 +01:00
Mike Fährmann
30a31836e7 merge #3449: [twitter] force HTTPS for TwitPic URLs 2023-01-05 14:57:03 +01:00