Commit Graph

1328 Commits

Author SHA1 Message Date
Mike Fährmann
c9b8e6aefc [reddit] fix submission-ID parsing (#104)
Uppercase characters caused a ValueError exception
2018-09-07 18:27:54 +02:00
Mike Fährmann
488abeca0b [hentaicafe] adjust default directory format
A separate folder for each chapter is rather pointless if almost all
manga have only one chapter each.
2018-09-07 18:25:58 +02:00
Mike Fährmann
b4eca2633e [tumblr] support /archive URLs 2018-09-06 11:09:13 +02:00
Mike Fährmann
aa1de70da0 [tumblr] recognize inline videos (#102) 2018-09-06 10:37:40 +02:00
Mike Fährmann
3ecea4cf36 [hentaicafe] add chapter and manga extractors (#101) 2018-09-05 21:08:40 +02:00
Mike Fährmann
41249f3ead improve extractor.get_downloader() 2018-09-05 18:17:16 +02:00
Mike Fährmann
eb3185d6a3 update exception hierarchy 2018-09-05 18:15:33 +02:00
Mike Fährmann
e9ae6fd080 improve downloader/postprocessor module loading
- handle arguments of any type without propagating an exception
- prevent potential security risk through relative imports
2018-09-05 16:39:40 +02:00
Mike Fährmann
712b58a93b [postprocessor] add black-/whitelist options
Each post-processor config dict now supports a list of extractor
categories for which it should/shouldn't be active for.

For example:
"postprocessors": [
    {"name": "classify",
     "whitelist": ["tumblr", "deviantart"],
     ...
    }
]
2018-09-03 14:53:43 +02:00
Mike Fährmann
8a23b21d0e [tests] let 'pattern' require at least 1 URL 2018-09-02 21:19:44 +02:00
Mike Fährmann
0bc8ef51c8 [smugmug] Handle albums with no explicit owner (#100) 2018-09-01 12:55:02 +02:00
Mike Fährmann
ff83ee22b0 release version 1.5.2 2018-08-31 20:27:09 +02:00
Mike Fährmann
b47af4637a [mangadex] update URL pattern
Manga URLs now begin with /title/ instead of /manga/
2018-08-31 20:16:50 +02:00
Mike Fährmann
75862715ac [behance] add user extractor 2018-08-31 17:42:09 +02:00
Mike Fährmann
a493fed376 [deviantart] fix journal creation if no 'username' is set 2018-08-31 17:38:12 +02:00
Mike Fährmann
6ecb36d88c [postprocessor:ugoira] add 'ffmpeg-output' option 2018-08-31 17:37:35 +02:00
Mike Fährmann
02a4a67f6d [postprocessor:ugoira] support danbooru sources 2018-08-27 20:58:45 +02:00
Mike Fährmann
5b8a314de7 [tumblr] replace inline URLs with higher quality ones (#98) 2018-08-25 18:43:51 +02:00
Mike Fährmann
2af2bb7911 [mangadex] fix relative page URLs 2018-08-25 11:07:26 +02:00
Mike Fährmann
590c0b3ad5 re-implement and improve filename formatter
A format string now gets parsed only once instead of re-parsing it each
time it is applied to a set of data.

The initial parsing causes directory path creation to be at about 2x
slower than before, since each format string there is used only once,
but building a filename, the more common operation, is at least 2x
faster. The "directory slowness" cancels at about 5 filenames and
everything above that is significantly faster.
2018-08-25 10:45:14 +02:00
Mike Fährmann
34b556922d update/restore tests 2018-08-23 15:47:40 +02:00
Mike Fährmann
ab2bfaeb46 [ngomik] add replacement for 'subapics'
http://subapics.com/ got discontinued and replaced by http://ngomik.in/.

ngomik.in is still displaying a link to the "old site" showing a big
"Account Suspended" sign.
2018-08-23 15:29:53 +02:00
Mike Fährmann
a2eeef1f5e [behance] replace test
The "UVMW Studio" account and their galleries are gone.
2018-08-19 21:17:21 +02:00
Mike Fährmann
e9dd2eff1d [twitter] add extractor for media-tweet timelines (#96)
For example "https://twitter.com/PicturesEarth/media".
They are different from normal timelines in that they do not contain
any (re)tweets from other users and feature all media the user ever
posted, including responses to other tweets.
2018-08-19 20:46:12 +02:00
Mike Fährmann
f45c9f2141 [gfycat] test-updates and code-adjustments 2018-08-18 23:04:45 +02:00
Mike Fährmann
9b1c39032c [twitter] changes and improvements
- rename User- to TimelineExtractor
- rename 'userid' to 'user_id' to conform to the other ..._id values
- adjust archive_fmt to deal with retweets
- emulate browser behavior for API calls
2018-08-18 23:04:45 +02:00
Mike Fährmann
10365394d7 [twitter] add support for user-timelines (closes #96)
also adds a 'retweets' option to filter retweeted content
2018-08-17 20:04:11 +02:00
Mike Fährmann
e3055d356c release version 1.5.1 2018-08-17 13:21:36 +02:00
Mike Fährmann
d3f1eed2a6 [pinterest] improvements
- add stop condition for pin-related pins
- improve URL patterns
- make Pylint happy
2018-08-16 18:11:39 +02:00
Mike Fährmann
2801a0d997 [exhentai] skip "Content Warning" page when not logged in
(closes #97)
2018-08-16 09:17:22 +02:00
Mike Fährmann
63fa0b2006 [pinterest] add extractors for related pins
Related pins can not be accessed by adding a "#related" fragment
to the end of a Pinterest URL, for example:
- https://www.pinterest.com/pin/858146903966145189/#related
- https://www.pinterest.com/g1952849/test-/#related

There are no explicit real URLs for related pins,
using an option to enable them results in "clunky" code,
and a custom "related:<URL>" scheme doesn't feel right either.
2018-08-15 21:49:45 +02:00
Mike Fährmann
1694039de0 [komikcast] update ad-filter 2018-08-15 21:49:44 +02:00
Mike Fährmann
f9ded38d89 [test:results] add support for "range" options in tests 2018-08-15 21:49:44 +02:00
Mike Fährmann
c9e6ccbd7c [test:extractor] small fixes and improvements 2018-08-15 21:49:33 +02:00
Mike Fährmann
792135a339 enable Python 3.7 for Travis CI tests 2018-08-14 11:54:01 +02:00
Mike Fährmann
a74591b84b [tumblr] remove "original image" functionality
Accessing higher/original quality images on
https://s3.amazonaws.com/data.tumblr.com and http://data.tumblr.com
is no longer possible and any HTTP request results in 403 Forbidden.

A few images can still be accessed through https//a.tumblr.com [1][2],
but not as "_raw", just "_1280", and that might also be "fixed" in
the near future.

[1] https://a.tumblr.com/tumblr_kzjlfiTnfe1qz4rgho1_1280.jpg
[2] https://a.tumblr.com/ee589c6345f29d2d5935cecb49b0a705/tumblr_oztu02dIHp1wgha4yo1_1280.png
2018-08-14 11:51:17 +02:00
Mike Fährmann
38d4f43cc0 [komikcast] skip ads 2018-08-14 11:17:59 +02:00
Mike Fährmann
4313c95bc9 improve error message for OAuth2 authentication 2018-08-11 23:54:25 +02:00
Mike Fährmann
7f4e41c989 increase timeout during extractor tests
cloudflare's 522 response takes longer than 30 seconds
2018-08-10 16:51:05 +02:00
Mike Fährmann
b55e39d1ee [mangadex] improve extraction
- cache manga API results
- add artist, author and date fields to chapter metadata
- remove Manga-/ChapterExtractor inheritance
- minor code simplifications and improvements
2018-08-10 16:50:07 +02:00
Mike Fährmann
b1c4c1e13c [mangadex] fix extraction 2018-08-08 18:08:26 +02:00
Mike Fährmann
3c90df6635 [piczel] add user, folder and image extractors 2018-08-08 10:53:01 +02:00
Mike Fährmann
2a9f3341a2 [behance] fix title extraction 2018-08-08 10:48:58 +02:00
Mike Fährmann
3fc2f269fa [behance] filter 'fields' list 2018-08-07 12:14:41 +02:00
Mike Fährmann
b67339155f [rule34] update test results
'metadata' tag type has been removed
2018-08-07 12:13:34 +02:00
Mike Fährmann
a86f2bfc80 [pinterest] update not-found redirects 2018-08-07 12:13:19 +02:00
Mike Fährmann
7442d2940c release version 1.5.0 2018-08-03 17:50:27 +02:00
Mike Fährmann
b040ca0718 [rule34] small unit test fixes 2018-08-03 17:28:47 +02:00
Mike Fährmann
b164231bca [sankaku] increase default values for 'wait-min/-max' 2018-08-03 17:06:51 +02:00
Mike Fährmann
68d6033a5d use 'retries' and 'timeout' options for regular HTTP requests 2018-08-02 16:11:54 +02:00