Commit Graph

1979 Commits

Author SHA1 Message Date
Mike Fährmann
15e4ddf46d implement custom logging formatter
supports custom log message formats for each loglevel and, by
extension, custom ANSI codes and colors for errors and warnings

(#304)
2019-06-21 20:17:58 +02:00
Mike Fährmann
61e413d85d [hentaifoundry] stop disabling IPv6 addresses
The rogue address mentioned in a138d58 is no longer included in the DNS
results for www.hentai-foundry.com.
2019-06-21 20:03:14 +02:00
Mike Fährmann
76ae9957c2 [deviantart] force legacy version for single deviations
Let's see how long this works ...

DeviantArt is rolling out a new version of their website, including a
new internal and potentially usable API (rewrite incoming, yay).

The issue with the new layout is that it doesn't include the "old"
UUIDs for single deviations, i.e. mapping a numeric deviation ID to its
UUID counterpart is impossible with the new layout.
2019-06-20 19:26:15 +02:00
Mike Fährmann
70713f0f28 fix extractor result tests 2019-06-20 18:12:36 +02:00
Mike Fährmann
db3f52881a add 'mtime' option 2019-06-20 17:19:44 +02:00
Mike Fährmann
ee4d7c3d89 update downloader.find() and related code
Instead of replacing 'https' with 'http' for every URL in
'get_downloader()', this now only happens once during downloader
initialization. Also unit tests.
2019-06-20 16:59:44 +02:00
Mike Fährmann
f4ba98771d use Last-Modified header to set file modification time
(#236, #277)
2019-06-19 23:16:32 +02:00
Mike Fährmann
179d112083 [downloader] overhaul http and text modules
Get rid of the modular structure and simplify/specialize those modules.
2019-06-19 22:56:11 +02:00
Mike Fährmann
a01f99728c [postprocessor:zip] delete empty archives when done (#316) 2019-06-19 18:14:33 +02:00
Mike Fährmann
520c8ba106 [hentaicafe] extract 'tags' and 'artist' metadata (closes #238)
These metadata fields will only be filled in when using a top-level
URL, because that's the only place this information is available. Using
a Foolslide URL (1) will leave these fields empty.

(1) https://hentai.cafe/manga/read/.../en/0/1/"
2019-06-18 14:30:26 +02:00
Mike Fährmann
b51baa9a4b [hitomi] fix empty language detection; parse datetime 2019-06-17 20:02:58 +02:00
Mike Fährmann
258e8b2060 [deviantart] small code improvements 2019-06-17 19:49:50 +02:00
Mike Fährmann
a77340c647 [keenspot] fix extraction for "TwoKinds" 2019-06-17 19:49:39 +02:00
Mike Fährmann
03e6876fbe [instagram] provide 'description' metadata (#310) 2019-06-16 21:54:01 +02:00
Mike Fährmann
b171befa87 implement 'parse_unicode_escapes()' 2019-06-16 21:47:24 +02:00
Mike Fährmann
3a36a0fa1e release version 1.8.6 2019-06-14 21:11:58 +02:00
Mike Fährmann
ec3e8601f1 [slickpic] add user extractor (#249) 2019-06-14 18:55:56 +02:00
Mike Fährmann
97ef416218 [8muses] support multi-page listings (#305) 2019-06-14 18:48:22 +02:00
Mike Fährmann
f5961ac968 [deviantart] download deviations with no 'content' field
Some deviations (possibly only from sta.sh sources) are downloadable
(i.e. 'is_downloadable' is true and /deviation/download/ works), but
have no 'content' or similar  in their JSON representation.

(fixes #307)
2019-06-13 21:14:12 +02:00
Mike Fährmann
4e07f99e3e [mangoxo] change token message level to debug
The login page currently doesn't provide and require a login token
(logging in works without a token), so printing a warning during
each login is unnecessary.
2019-06-13 21:09:11 +02:00
Mike Fährmann
d997c10320 [8muses] add album extractor (#305) 2019-06-10 22:17:46 +02:00
Mike Fährmann
e05a96db5e [deviantart] rename 'stash' to 'extra' (#302)
'stash' is already used as a name for the StashExtractor and therefore
expected to be a dictionary.
2019-06-10 21:05:25 +02:00
Mike Fährmann
2184e3a86b [slickpic] add album extractor (#249) 2019-06-09 21:59:22 +02:00
Mike Fährmann
c23bf263fe [deviantart] rename 'external' to 'stash' (#302)
restrict extracted URLs to ones from https://sta.sh/...
2019-06-09 11:16:02 +02:00
Mike Fährmann
c73c2cda50 [pornhub] add gallery & user extractor (#282) 2019-06-07 16:31:20 +02:00
Mike Fährmann
7c6cb908f9 [xhamster] update test results 2019-06-07 16:28:49 +02:00
Mike Fährmann
035b850e82 update postprocessor entries in example config
- use whitelists
- add ugoira example (#299)
2019-06-07 13:47:02 +02:00
Mike Fährmann
2fb85178da [deviantart] add 'external' option (#302)
If a description is available, this will extract URLs from the
description text and try to find Extractors for them.
2019-06-06 18:53:50 +02:00
Mike Fährmann
f85e42cffc [deviantart] fix --range for deviation & stash extractor 2019-06-06 18:45:10 +02:00
Mike Fährmann
40c7eb3424 [livedoor] improve extraction (fixes #301) 2019-06-06 15:22:27 +02:00
Mike Fährmann
62335b9015 [paheal] adjust test results 2019-06-05 11:42:01 +02:00
Mike Fährmann
aa1ca4ed35 [shopify] skip deleted products (#175)
Product pages which return a 4xx status code will now be skipped instead
of raising an exception.
2019-06-05 11:40:54 +02:00
Mike Fährmann
096009367b [xhamster] add gallery & user extractor (#281) 2019-06-05 11:11:51 +02:00
Mike Fährmann
208202b962 [tumblr] improve error handling (#297)
In some cases Tumblr's API responds with an HTML document.
Trying to decode it as JSON would raise an uncaught exception.
2019-06-04 14:02:17 +02:00
Mike Fährmann
c08c340178 [directlink] make pattern case insensitive (fixes #296) 2019-06-03 10:56:14 +02:00
Mike Fährmann
95b4a53b9c [keenspot] improve pagination (#223)
The old code would skip the last comic page for some series.
2019-06-02 22:12:21 +02:00
Mike Fährmann
12c965d547 release version 1.8.5 2019-06-01 20:57:55 +02:00
Mike Fährmann
731c7cbd5b [keenspot] support all comics and "random" access (#223) 2019-06-01 20:48:13 +02:00
Mike Fährmann
6a34f4b0c1 skip tests on read timeouts; print list of skipped tests 2019-06-01 20:47:31 +02:00
Mike Fährmann
1c36e65e9b [exhentai] choose site version depending on input URL (#278)
Use e-hentai.org as root and cookiedomain if the input URL is from
e-hentai (or g.e-hentai), use exhentai.org otherwise.
2019-05-31 15:34:39 +02:00
Mike Fährmann
6da3e21237 [downloader:ytdl] provide 'filename' metadata (closes #291) 2019-05-31 14:56:45 +02:00
Mike Fährmann
d33f5a7423 [wallhaven] rewrite
- use API
- remove login support, add 'api-key' option
- remove support for "alpha" subdomain - alpha.wallhaven.cc used numeric
  IDs that can't be translated to the new ID system
- support direct links to wallpapers
2019-05-31 14:53:02 +02:00
Mike Fährmann
5499934ae2 [ngomik] fix extraction 2019-05-30 20:18:36 +02:00
Mike Fährmann
f1893b2b5b [deviantart] add 'folders' option (#276) 2019-05-30 17:28:12 +02:00
Mike Fährmann
c849574def [keenspot] add comic extractor (#223)
Doesn't work for
- http://brawlinthefamily.keenspot.com/
- http://flipside.keenspot.com/
- http://lastblood.keenspot.com/
- http://mysticrevolution.keenspot.com/
- http://porcelain.keenspot.com/
- http://twokinds.keenspot.com/
yet, because of custom layouts.
2019-05-28 21:34:38 +02:00
Mike Fährmann
2b1999476e implement 'text.rextract()' 2019-05-28 21:03:41 +02:00
Mike Fährmann
8bd5a19515 [hentainexus] add '_extractor' data 2019-05-28 00:20:01 +02:00
Mike Fährmann
2a085a5e96 [sankakucomplex] fix 'date' values (#258) 2019-05-28 00:18:58 +02:00
Mike Fährmann
bcd1801aa8 [sankakucomplex] add 'tag' extractor (#258) 2019-05-27 23:57:44 +02:00
Mike Fährmann
74c2415138 [sankakucomplex] move article extractor to its own module (#258) 2019-05-27 23:49:23 +02:00