Commit Graph

52 Commits

Author SHA1 Message Date
Mike Fährmann
a74591b84b [tumblr] remove "original image" functionality
Accessing higher/original quality images on
https://s3.amazonaws.com/data.tumblr.com and http://data.tumblr.com
is no longer possible and any HTTP request results in 403 Forbidden.

A few images can still be accessed through https//a.tumblr.com [1][2],
but not as "_raw", just "_1280", and that might also be "fixed" in
the near future.

[1] https://a.tumblr.com/tumblr_kzjlfiTnfe1qz4rgho1_1280.jpg
[2] https://a.tumblr.com/ee589c6345f29d2d5935cecb49b0a705/tumblr_oztu02dIHp1wgha4yo1_1280.png
2018-08-14 11:51:17 +02:00
Mike Fährmann
1c1e086d01 use common base class for OAuth1.0 based API interfaces 2018-05-10 21:57:45 +02:00
Mike Fährmann
6a31ada9e3 re-implement OAuth1.0 code
OAuth support for SmugMug needs some additional features
(auth-rebuild on redirect, query parameters in URL, ...)
and fixing this in the old code wouldn't work all that well.
2018-05-10 18:47:05 +02:00
Mike Fährmann
69a5e6ddb3 Merge branch 'master' into 1.4-dev 2018-05-04 10:19:02 +02:00
Mike Fährmann
8b79eaafea [tumblr] log actual time of rate limit resets
... instead of the amount of seconds until a reset
2018-04-25 16:13:03 +02:00
Mike Fährmann
f471161920 Merge branch 'master' into 1.4-dev 2018-04-21 12:15:40 +02:00
Mike Fährmann
b1325d4d2c fix extractor docstrings 2018-04-18 18:03:43 +02:00
Mike Fährmann
728c64a3fb [tumblr] rename 'offset' to 'num and adjust formats
Trying to somehow emulate Tumblr filenames is a bad idea ...
2018-04-15 18:58:32 +02:00
Mike Fährmann
6bd857a319 [tumblr] handle rate limits / 429 errors
- wait for the hourly limit to reset
- abort upon exceeding the daily limit (it doesn't seem useful to
  potentially wait for several hours)
2018-04-12 16:25:20 +02:00
Mike Fährmann
a1fa4b43b0 Revert "[tumblr] add option to sort photosets by upload order"
This reverts commit 4a26ae32df.
2018-04-09 16:08:08 +02:00
Mike Fährmann
4a26ae32df [tumblr] add option to sort photosets by upload order 2018-04-07 15:57:55 +02:00
Mike Fährmann
6b72be8ee6 [tumblr] add 'hash' keyword
'hash' is the middle part of the filename in a tumblr image URL.
For example an image with '.../tumblr_p6tgemp1NZ1wgha4yo1_250.png' as
its URL would have 'p6tgemp1NZ1wgha4yo1' as hash.
2018-04-07 15:54:30 +02:00
Mike Fährmann
68e9fbee16 [tumblr] check all 4 keys/secrets before using OAuth
it was possible to cause a crash by setting api-key or -secret to null.
(this commit also slightly improves the blog-cache implementation)
2018-04-05 15:42:23 +02:00
Mike Fährmann
f8168c693e [tumblr] avoid calls to '/blog/.../info'
The same information returned by the 'blog/.../info' API endpoint
is also included in the result of every 'blog/.../posts' call.
2018-04-04 14:15:24 +02:00
Mike Fährmann
858fdbdb22 [tumblr] improve 'inline' extraction
'quote' posts store their HTML content in the 'source' field
2018-03-02 06:59:44 +01:00
Mike Fährmann
5008e105ee update archive IDs
... to behave in a more straightforward way when dealing with
bookmarks/favourites/etc.

specific IDs are now grouped by their owner, album-id, ... to
allow for duplicates when it would be expected.
2018-03-01 18:20:50 +01:00
Mike Fährmann
3cec533c28 Merge branch 'archive' 2018-02-12 18:07:58 +01:00
Mike Fährmann
d38bf2f54c [tumblr] recognize /image/... URLs
xyz.tumblr.com/image/123 refers to the same images
as xyz.tumblr.com/post/123.
2018-02-08 23:08:14 +01:00
Mike Fährmann
34873dbd90 set 'archive_fmt' values
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
9fccd7b783 [tumblr] provide fallback URLs (#64)
Each image now produces 3 URLs:
- amazonaws.com _raw (or _1280 for older images)
- amazonaws.com _500
- media.tumblr.com (URL returned by API)
2018-01-19 23:12:15 +01:00
Mike Fährmann
421a9740a3 [tumblr] add 'tumblr:' to force Tumblr extractor (#71) 2018-01-15 18:27:58 +01:00
Mike Fährmann
9a049bdf51 [tumblr] add 'likes' extractor (#65) 2018-01-12 14:56:01 +01:00
Mike Fährmann
29d75fc3fa [tumblr] add support for OAuth authentication (#65) 2018-01-11 14:11:37 +01:00
Mike Fährmann
75b2e84b6d [tumblr] use s3.amazonaws.com for image URLs (#64) 2018-01-09 15:13:00 +01:00
Mike Fährmann
03b8a548cb [tumblr] change reblogs default value to true (#61) 2018-01-06 15:52:08 +01:00
Mike Fährmann
d235f68f59 [tumblr] add option to filter reblogged posts (#61)
Reblogs are ignored by default, but can be included by setting
'extractor.tumblr.reblogs' to 'true'.
2018-01-05 13:05:57 +01:00
Mike Fährmann
b14de6ffc2 [tumblr] small improvements
- don't transform inline GIF URLs
- set 'type' parameter for API calls if there is only
  one post type selected
2017-11-24 16:51:07 +01:00
Mike Fährmann
9296a26eae [tumblr] add warning messages 2017-11-23 16:12:07 +01:00
Mike Fährmann
12de658937 [tumblr] add options to control extraction behavior (#48)
- posts   : list of post-types to inspect
- inline  : scan post bodies for inline images
- external: follow external links
2017-11-23 15:32:54 +01:00
Mike Fährmann
077f8c12be [tumblr] original video URLs + continuous offset 2017-11-20 20:51:02 +01:00
Mike Fährmann
8eb12ebeae [tumblr] support more post/media types (#48)
This adds support for audio and video posts (most videos are shared
from youtube/instagram which isn't supported -> youtube-dl),
as well as link posts and image-search inside of text posts.

Most of this is just WIP and will need some sort of improvement
and options to enable/disable different media types etc.
2017-11-18 23:11:32 +01:00
Mike Fährmann
980fd3616d [tumblr] use API v2 (#48) 2017-11-03 22:16:57 +01:00
Mike Fährmann
d6bed9f36f [tumblr] prevent premature exit to get all images (fixes #48) 2017-11-03 14:59:31 +01:00
Mike Fährmann
81a7788b40 replace space characters in unit test URLs 2017-10-23 17:00:53 +02:00
Mike Fährmann
393755ee94 [tumblr] update tests 2017-10-09 00:10:37 +02:00
Mike Fährmann
6f30cf4c64 change keyword names to valid Python identifiers
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.

(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
2017-09-10 22:20:47 +02:00
Mike Fährmann
80c2e03aaa [reddit] allow 'date-min/max' to be human readable dates
If the date-min/max config value is a string, try parsing it using
datetime.strptime [1] with 'date-format' as format string [2]
(default: "%Y-%m-%dT%H:%M:%S")

Example: get all submissions posted in 2016

$ gallery-dl reddit.com/r/... \
    -o date-format=%Y \
    -o date-min=\"2016\" \
    -o date-max=\"2017\"

[1] https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime
[2] https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
2017-07-01 18:46:38 +02:00
Mike Fährmann
71e08dc9c4 [tumblr] keyword consistency 2017-04-13 20:47:22 +02:00
Mike Fährmann
bd95fea82c update unit test results 2017-04-11 21:03:09 +02:00
Mike Fährmann
94e10f249a code adjustments according to pep8 nr2 2017-02-01 00:53:19 +01:00
Mike Fährmann
0211ec4114 update some tests 2016-12-08 00:24:23 +01:00
Mike Fährmann
8d106a447c [tumblr] delete more useless keywords 2016-09-27 21:49:38 +02:00
Mike Fährmann
56d810c896 update keyword hashes for tests 2016-09-25 17:28:46 +02:00
Mike Fährmann
19c2d4ff6f remove explicit (sub)category keywords 2016-09-25 14:22:07 +02:00
Mike Fährmann
85ff3d160e [tumblr] fix json parsing + metadata consistency 2016-09-16 09:38:14 +02:00
Mike Fährmann
d7e168799d consistent extractor naming scheme + docstrings 2016-09-12 10:34:31 +02:00
Mike Fährmann
808cf69556 update a few tests 2016-09-01 18:28:16 +02:00
Mike Fährmann
6f7d42b974 update tests 2016-07-12 12:08:36 +02:00
Mike Fährmann
81096f7790 [tumblr] fix json parsing 2016-03-06 15:30:55 +01:00
Mike Fährmann
f974ea73db [tumblr] add tag-extractor 2016-02-20 15:24:55 +01:00