Commit Graph

113 Commits

Author SHA1 Message Date
Mike Fährmann
8fb043e8ff [tumblr] raise more detailed errors for dashboard-only blogs
(#3628)
2023-02-12 19:38:14 +01:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
ClosedPort22
4e80d3210e [tumblr] Fallback to gifv when possible (#3095) (#3159) 2022-11-04 19:42:36 +01:00
Mike Fährmann
7c6af27eb8 [tumblr] add 'fallback-*' options (#2957)
specifically 'fallback-delay' and 'fallback-retries'
and change default number of retries to 2 (down from 3)
2022-10-26 13:59:09 +02:00
Mike Fährmann
68466a7d61 [tumblr] support 'https://www.tumblr.com/BLOGNAME' URLs (#3034) 2022-10-11 21:09:24 +02:00
Mike Fährmann
f1f89b2436 [tumblr] add 'offset' option 2022-10-11 10:54:23 +02:00
Mike Fährmann
e5d229c524 [tumblr] sleep between fallback retries (#2957) 2022-10-11 10:48:28 +02:00
Mike Fährmann
e1d714943b [tumblr] catch exception when updating image token (#2957) 2022-09-30 15:08:21 +02:00
Mike Fährmann
f728b5ca06 [tumblr] add fallback for failed higher-resolution images (#2957) 2022-09-28 21:36:09 +02:00
Mike Fährmann
32c30754d1 [tumblr] warn when unable to fetch higher-resolution images (#2957)
and download the smaller version
instead of failing with a 404 error
2022-09-26 12:05:34 +02:00
Mike Fährmann
46fe469c53 [tumblr] implement 'ratelimit' option (#2919) 2022-09-17 14:10:33 +02:00
Mike Fährmann
7a799df17f [tumblr] pre-compile regular expressions 2022-09-13 17:50:48 +02:00
blankie
9745b48830 [tumblr] attempt to fetch high-quality inline images (#2877)
* [tumblr] attempt to fetch high-quality images (again)

Fixes #1846, and fixes #1344

* slight refactor

* update configuration.rst entry
2022-08-31 10:53:50 +02:00
blankie
e4cff67aaa [tumblr] add count metadata field (#2804)
Fixes #2778
2022-08-18 18:24:37 +02:00
Mike Fährmann
a27b17481f [tumblr] restrict condition for calling _original_image 2022-08-11 12:20:39 +02:00
Mike Fährmann
df1c643dda [tumblr] attempt to extract full-resolution photos
- for photos with apparent width == 2048 or height == 3072
- can be disabled with 'original' option
2022-08-10 20:01:46 +02:00
blankie
5b63df46c0 [tumblr] attempt to get higher-quality images (#2761) 2022-07-27 10:47:43 +02:00
Mike Fährmann
a566e63cdf [tumblr] support '/blog/view' URLs (#2760) 2022-07-15 15:22:54 +02:00
Mike Fährmann
6ea3ff5173 [tumblr] notify users about registering an oauth application
if they hit the daily rate limit and are using default API credentials
2022-03-06 16:28:53 +01:00
Vrihub
96fcff182c generic extractor (#735)
* Generic extractor, see issue #683

* Fix failed test_names test, no subcategory needed

* Prefix directory_fmt with "generic"

* Relax regex (would break some urls)

* Flake8 compliance

* pattern: don't require a scheme

This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).

* Fix using g: and r: on urls without http(s) scheme

Almost all extractors accept urls without an initial http(s) scheme.

Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).

This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.

This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.

* Relax imageurl_pattern_ext: allow relative urls

* First round of small suggested changes

* Support image urls starting with "//"

* self.baseurl: remove trailing slash

* Relax regexp (didn't catch some image urls)

* Some fixes and cleanup

* Fix domain pattern; option to enable extractor

Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
2021-12-29 22:39:29 +01:00
Mike Fährmann
ddd48ceee5 update extractor test results 2021-03-28 23:06:44 +02:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
3918b69677 remove 'extractor.blacklist' context manager 2020-09-11 13:17:35 +02:00
Mike Fährmann
7876a03ece [tumblr] create directories for each post (fixes #965)
This changes the identifiers for directory format string fields.
Everything blog related is now inside a 'blog' object
and not at the "base level" anymore.

E.g. '{name}' for directories is now '{blog[name]}'
(or '{blog_name}', since that is also available)
2020-08-31 21:58:20 +02:00
Mike Fährmann
2ecf1efb16 update extractor test results
- tumblr: remove deleted post
- jaiminisbox: replace removed manga/chapters
- smugmug: one inconsequential field got removed
2020-07-18 15:12:28 +02:00
Mike Fährmann
5e5be67c26 [tumblr] prevent KeyErrors when using reblogs=same-blog
(fixes #851)
2020-06-25 19:00:12 +02:00
Mike Fährmann
09cc9dbec0 prevent flake8 errors from comments looking like type annotations 2020-05-12 20:08:05 +02:00
Mike Fährmann
d02f7c1118 improve Extractor.wait()
- allow 'until' to be a datetime object
- do "time calculations" with UTC timestamps
- set a default 'reason'
2020-04-05 21:23:05 +02:00
Mike Fährmann
d94215d119 [tumblr] replace '-' with ' ' in tag searches (fixes #611)
To search for tags with actual minus signs in them
(there shouldn't be too many,) manually replace those
with url-encoded minus characters ('-' -> '%2d')
before inputting them into gallery-dl:

https://s679874.tumblr.com/tagged/tag-with-minus
 ->
https://s679874.tumblr.com/tagged/tag%2dwith%2dminus
2020-02-17 23:29:13 +01:00
Mike Fährmann
3811fd8a25 fix time formatting for Python 3.4 and 3.5
'datetime.time.isoformat()' only has an optional 'timespec' argument
since Python 3.6.
2020-01-05 00:47:10 +01:00
Mike Fährmann
569747a78d implement extractor.wait() 2020-01-04 23:42:07 +01:00
Mike Fährmann
ce54b8c04c let extractors opt-out of cookie option usage
useful to avoid sending unnecessary cookies when all authentication
is done through OAuth tokens
2020-01-01 21:12:37 +01:00
Mike Fährmann
c4702ec9b6 simplify some logging calls 2019-12-10 21:30:08 +01:00
Mike Fährmann
4409d00141 embed error messages in StopExtraction exceptions 2019-10-28 16:39:49 +01:00
Mike Fährmann
d5fbb2d9de [tumblr] ignore audio links from Spotify etc. 2019-09-07 18:18:12 +02:00
Mike Fährmann
1133b7fcbd [smugmug] update unit tests
The account used for tests before has been deleted.
2019-07-19 17:16:24 +02:00
Mike Fährmann
8d1ae9b715 [tumblr] enable date-min/-max/-format options (#337) 2019-07-17 14:36:41 +02:00
Mike Fährmann
208202b962 [tumblr] improve error handling (#297)
In some cases Tumblr's API responds with an HTML document.
Trying to decode it as JSON would raise an uncaught exception.
2019-06-04 14:02:17 +02:00
Mike Fährmann
add7e693d0 [tumblr] provide parsed 'date' metadata (#232) 2019-04-29 17:30:42 +02:00
Mike Fährmann
fb14f80d62 [tumblr] fix avatar URLs for non-OAuth1.0 calls (closes #193) 2019-03-17 11:07:22 +01:00
Mike Fährmann
d0059cab79 [tumblr] check for null URLs (closes #165) 2019-02-19 13:49:55 +01:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
0afa913de4 [tumblr] add tests for hidden and private blogs (#145)
Hidden / dashboard-only blogs are pretty straightforward and "only"
require a valid 'access-token' and 'access-token-secret' for the given
'api-key' and 'api-secret', so that signed OAuth1.0 requests are possible.

Private / password protected blogs on the other hand are a bit
cumbersome. In addition to a valid 'access-token' and
'access-token-secret', they also require the account belonging to those
tokens to be a member of the blog itself. Knowing the password and
entering it in the website isn't enough to access a blog through the
API. Following a private blog is also impossible, so that option can't
work either.
2019-01-03 16:12:24 +01:00
Mike Fährmann
2f4f60de33 [tumblr] add tests for each post type 2018-12-27 22:41:42 +01:00
Mike Fährmann
28f9539551 [tumblr] change default values for post types and inline media 2018-12-26 18:55:59 +01:00
Mike Fährmann
5be95034ba [tumblr] add option to download avatars (#137) 2018-12-26 14:29:30 +01:00
Mike Fährmann
2e5f82e59e [tumblr] don't follow 'external' Tumblr URLs (#139) 2018-12-22 14:05:43 +01:00
Mike Fährmann
049a9575c4 [tumblr] fix inline extraction #2
Using only the "comment" field isn't enough ...

[ci skip]
2018-12-11 21:57:20 +01:00