Commit Graph

113 Commits

Author SHA1 Message Date
Mike Fährmann
2132e5461a [twitter] restore TwitPic support 2020-06-04 01:22:34 +02:00
Mike Fährmann
bd0f21478a [twitter] login using the mobile nojs login page 2020-06-04 00:07:12 +02:00
Mike Fährmann
a10f31dde5 [twitter] rewrite; use new interface (#740, #806)
Everything except logging in with username & password and TwitPic
embeds should be working again.

Metadata per Tweet is massively different than before (mostly raw API
responses - might need some cleaning up) and the default 'archive_fmt'
changed.
2020-06-03 20:51:29 +02:00
Mike Fährmann
45baa13615 update extractor test results
- don't run Instagram tests on Travis anymore
- replace Twitter test because timeline was made private
- update Hiperdex domain to '.com' (again ...)
2020-05-28 02:18:06 +02:00
Mike Fährmann
9f638c2e01 [twitter] add 'replies' option (closes #705) 2020-04-29 23:20:06 +02:00
Mike Fährmann
d3b3b30107 update test results 2020-04-26 22:14:28 +02:00
Mike Fährmann
3eab07739f [twitter] ensure videos have a 'filename'
This usually gets set when invoking the 'ytdl' downloader, but when
that fails, the error message would use 'None' as filename.
2020-04-24 22:34:19 +02:00
Mike Fährmann
c4371a6970 [twitter] add 'reply' metadata field (#705) 2020-04-24 22:31:24 +02:00
Mike Fährmann
d02f7c1118 improve Extractor.wait()
- allow 'until' to be a datetime object
- do "time calculations" with UTC timestamps
- set a default 'reason'
2020-04-05 21:23:05 +02:00
Mike Fährmann
b607d0ad7f [twitter] fix typo in 'x-twitter-auth-type' header (#625) 2020-03-21 23:11:39 +01:00
Mike Fährmann
2d5703c493 [twitter] use a simpler data structure to store cookies in cache
Use a dict with name-value pairs instead of an entire
RequestsCookieJar object.
2020-03-12 22:02:12 +01:00
Mike Fährmann
32df8d06fe [twitter] add 'bookmark' extractor (closes #625) 2020-03-06 01:20:04 +01:00
Mike Fährmann
19ae6f3fc4 update test results
- twitter:

    Don't test the whole kwdict, only the actual content, since the
    keyword hash changes whenever that user changes his display name.

- khinsider:

    Download host changed
2020-02-22 03:25:32 +01:00
Mike Fährmann
74e684e828 [twitter] change default value for 'videos' to 'true'
Every other 'videos' option defaulted to 'true', except Twitter.
2020-02-14 01:03:42 +01:00
Mike Fährmann
facc5daa6d [twitter] force old login page layout (fixes #584, fixes #598) 2020-02-02 17:24:53 +01:00
Mike Fährmann
e0dd073ce0 [twitter] replace embedded tweet test
the old one was deleted
2020-01-31 12:51:55 +01:00
Mike Fährmann
25d5ec4ff3 [twitter] add option to extract TwitPic embeds (#579) 2020-01-18 21:31:29 +01:00
Alice
f498a9057f [twitter] Fix stop before real end (#573)
* [twitter] Fix stop before real end

Fix for https://github.com/mikf/gallery-dl/issues/544. Makes sure that it really reached the end by checking that both "min_position" is null and "has_more_items" is false before stopping.

* [twitter] Fix stop before real end (update)
2020-01-14 12:24:30 +01:00
Mike Fährmann
43ab9572b4 [twitter] handle API rate limits (#526) 2020-01-04 23:46:29 +01:00
Mike Fährmann
5532e9c158 [twitter] handle quoted tweets (#526)
… and categorize them as retweets
2020-01-04 21:26:55 +01:00
Mike Fährmann
896896a490 [twitter] fix URLs forwarded to youtube-dl (closes #540)
Since commit 3bba763 data["user"] is an entire dict object
and no longer just the user nickname …
2019-12-25 17:28:55 +01:00
Mike Fährmann
07dafad26d [twitter] attempt to fix infinite loops (#499)
(Hopefully this doesn't break anything else)
2019-12-03 22:55:29 +01:00
Mike Fährmann
3bba763ab9 [twitter] improve
- update metadata structure
  - combine all user… entries into their own dict
  - let 'user' always specify the Timeline owner
  - add 'author' entry that specifies the original Tweet author
- create directories per post (closes #491)
- fix username issues with /i/web/ URLs
2019-11-30 22:30:37 +01:00
Mike Fährmann
5513b66eb0 [vsco] fix user profile extraction 2019-11-12 23:36:48 +01:00
Mike Fährmann
c01ff78467 [twitter] extend 'videos' option to force extraction with ytdl
(closes #459)
2019-11-01 22:06:07 +01:00
Mike Fährmann
49a6b1b6c0 [twitter] extract video stream info without youtube-dl (#452)
This should allow video downloads when logged in without
'forward-cookies' disabled and from protected tweets.

youtube-dl still gets used to download HLS playlists, but the data
extraction part, which doesn't work with youtube-dl at the moment,
now gets handled by gallery-dl itself.
2019-10-25 13:41:36 +02:00
Mike Fährmann
9f0dbf2a72 [twitter] raise proper exception for protected Tweets 2019-10-25 13:26:16 +02:00
Mike Fährmann
2eb38810c5 [twitter] fix image extraction when logged in (#452)
... for individual tweets.

To get a Tweet page with the old Twitter layout, an Internet
Explorer User-Agent (e.g. Mozilla/5.0 (Windows NT 6.1; WOW64;
Trident/7.0; rv:11.0) like Gecko) as well as a Referer header
pointing to the page itself is required. The "app_shell_visited"
cookie appears to be optional at the moment, but that is what
a regular web browser would send.
2019-10-23 22:18:29 +02:00
Mike Fährmann
ef17d94469 update test results 2019-10-21 21:53:21 +02:00
Mike Fährmann
1c03a389df [twitter] small improvements to search extractor
- put search results in separate directories
- set 'max_position' to '-1' for first request
  -> prevent duplicate results
- add a test
- flake8
2019-10-17 19:50:59 +02:00
Alice
bcddcca6db Add search downloading to twitter.py (#448)
Adds the functionality to download search results on twitter.com/search. Since twitter only allows downloading of up to 3,200 of a users most recent tweets, you will be unable to download old images from users with a lot of tweets. To bypass this, you can use the twitter search to get the tweets from the sections in time you were stopped at. An example search would be "from:user since:2015-01-01 until:2016-01-01 filter:images". The URL you would use will look something like this https://twitter.com/search?f=tweets&q=from%3Asupernaturepics%20since%3A2015-01-01%20until%3A2016-01-01%20filter%3Aimages&src=typd&lang=en

The _tweets_from_api function had to be changed because it would not get the next page of results using the last "data-tweet-id". It would return the same JSON but with a "min_position" string added. Using this string for the "max_position" param from the second page onwards correctly returned the next pages. This change does not interfere with how the other extractors work as far as I know. The 2 regex patterns in the extractors had to be changed to not match the search URL.
2019-10-16 18:23:10 +02:00
Mike Fährmann
66cac207ac [twitter] match and use 'i/web' status URLs 2019-09-24 21:18:05 +02:00
Mike Fährmann
e7690ac694 [vsco] update URL pattern (closes #410) 2019-09-08 11:37:27 +02:00
Mike Fährmann
bc0ca66c99 [twitter] small improvements
- handle reply tweets (#403)
- unset cookies in Tweet extractor to "force" the legacy interface
2019-09-01 17:37:48 +02:00
Mike Fährmann
23251356cb require 'extension' data for each URL (#382) 2019-08-14 20:03:03 +02:00
Mike Fährmann
feb98cf196 [twitter] improve 'content' formatting; add option (#338)
- include emoticons
- leave newlines intact
- remove pic.twitter.com/ links at the end
2019-07-17 16:02:51 +02:00
Mike Fährmann
0151e250f5 [twitter] extract 'content' metadata (closes #333) 2019-07-15 16:25:22 +02:00
Mike Fährmann
8de5866fd2 [twitter] replace unit test URLs
https://twitter.com/PicturesEarth was deleted
2019-05-09 10:17:55 +02:00
Mike Fährmann
049e9fd6ce [twitter] fix pagination end condition
Some timelines would cause an endless loop because 'has_more_items' is
always True, even if it would return the same list of tweets over and
over again.
2019-05-08 15:43:59 +02:00
Mike Fährmann
dcc1592dbf [twitter] add fallback URLs (#237) 2019-04-30 15:57:21 +02:00
Mike Fährmann
6264a46212 use 'utcfromtimestamp()'
'fromtimestamp()' converts its results to the local timezone and causes
problems when running tests on a different machine.
2019-04-21 16:22:53 +02:00
Mike Fährmann
d84e7c6861 [twitter] extract 'date' metadata (#224) 2019-04-21 15:41:22 +02:00
Mike Fährmann
f2cf1c1d73 use 'text.extract_from()' in a few places 2019-04-21 15:19:20 +02:00
Mike Fährmann
e730fc9045 [twitter] add login support (#214) 2019-04-09 09:27:49 +02:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
baad7b0fa5 [twitter] unpack API responses when logged in (closes #123) 2018-11-14 11:49:35 +01:00
Mike Fährmann
1532d1b690 fix 'range' tests and update a few test results 2018-10-08 23:53:58 +02:00
Mike Fährmann
188876d814 implement youtube-dl downloader module
URLs starting with 'ytdl:' will now be handled by youtube-dl.
There is probably a lot to fix and improve, but the basic use case
works.

TODO:
- format selection and ytdl options in general
- better filename/path handling
- ytdl support for "unsupported URLs"
- ...
2018-10-05 18:05:11 +02:00