Commit Graph

1300 Commits

Author SHA1 Message Date
Mike Fährmann
0151e250f5 [twitter] extract 'content' metadata (closes #333) 2019-07-15 16:25:22 +02:00
Mike Fährmann
56c7a66a4a detect Cloudflare CAPTCHAs and update cipher list 2019-07-10 15:18:20 +02:00
Mike Fährmann
a7b42b37a2 [35photo] fix extraction 2019-07-09 20:33:57 +02:00
Mike Fährmann
04b8d0894a [newgrounds] improve metadata extraction 2019-07-08 17:53:55 +02:00
Mike Fährmann
12da6bd0c9 [simplyhentai] fix/improve extraction 2019-07-06 20:25:53 +02:00
Mike Fährmann
fdec59f8e2 replace extractor.request() 'expect' argument
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
2019-07-05 00:42:16 +02:00
Mike Fährmann
2ff73873f0 [erolord] add gallery extractor (closes #326) 2019-07-04 20:28:04 +02:00
Mike Fährmann
b4da8c5a97 [sexcom] add extractor for related pins (#325) 2019-07-03 21:04:23 +02:00
Mike Fährmann
69997e92db [sexcom] skip unavailable pins (#325) 2019-07-02 22:05:54 +02:00
Mike Fährmann
bc6b0cfddc [shopify] skip consecutive duplicate products
Not filtering duplicate URLs anymore caused the archive ID uniqueness
test to fail.
2019-07-01 20:04:57 +02:00
Mike Fährmann
b89f0d8d3c update extractor result tests 2019-07-01 20:02:47 +02:00
Mike Fährmann
69205df68d allow '-1' for infinite retries (#300) 2019-06-30 23:10:47 +02:00
Mike Fährmann
f7b5c4c3e7 use values of 'retries' options correctly
The RE-tries option now specifies exactly that: the maximum number a
failed HTTP request is re-tried. For example a value of 2 will now
correctly stop after 3 attempts: the initial one + 2 re-tries.

The maximum wait-time now also caps at 30min and increases exponentially
for both extractor.request() and downloader.http.download().
2019-06-30 23:10:18 +02:00
Mike Fährmann
40da44b17f Merge branch 'v1.9.0' 2019-06-29 15:39:52 +02:00
Mike Fährmann
7a99e85943 [kissmanga] fix download URLs and file extensions
The current Blogspot image URLs hosted on Kissmanga end with an
"invalid" query parameter (/000.png&upx=...), which doesn't get
recognized by 'spliturl()' and 'parseurl()' as such and gets therefore
included in the 'extension' field from 'text.nameext_from_url()'.
2019-06-28 20:34:43 +02:00
Mike Fährmann
055102431f [hitomi] handle Game CG galleries with scenes (fixes #321) 2019-06-27 20:25:40 +02:00
Mike Fährmann
a9c89085fb [instagram] implement login support (#195) 2019-06-26 23:58:47 +02:00
Mike Fährmann
7856e5e7dc ]deviantart] "fix" scraps extraction 2019-06-25 18:18:12 +02:00
Mike Fährmann
082cb24acd [pururin] fix extraction
Missing metadata information would lead to unnecessary exceptions.
2019-06-24 22:27:50 +02:00
Mike Fährmann
98554cbab8 [mangoxo] fix login 2019-06-24 21:57:17 +02:00
Mike Fährmann
108963d138 [imagefap] include Referer headers 2019-06-24 21:31:29 +02:00
Mike Fährmann
e314621366 [nsfwalbum] fix default directory_fmt (#287) 2019-06-24 18:29:54 +02:00
Mike Fährmann
18a1f8c6cd [vanillarock] add post and tag extractors (closes #254) 2019-06-23 22:45:36 +02:00
Mike Fährmann
f0c5093812 [nsfwalbum] add album extractor (closes #287) 2019-06-23 22:45:07 +02:00
Mike Fährmann
61e413d85d [hentaifoundry] stop disabling IPv6 addresses
The rogue address mentioned in a138d58 is no longer included in the DNS
results for www.hentai-foundry.com.
2019-06-21 20:03:14 +02:00
Mike Fährmann
76ae9957c2 [deviantart] force legacy version for single deviations
Let's see how long this works ...

DeviantArt is rolling out a new version of their website, including a
new internal and potentially usable API (rewrite incoming, yay).

The issue with the new layout is that it doesn't include the "old"
UUIDs for single deviations, i.e. mapping a numeric deviation ID to its
UUID counterpart is impossible with the new layout.
2019-06-20 19:26:15 +02:00
Mike Fährmann
520c8ba106 [hentaicafe] extract 'tags' and 'artist' metadata (closes #238)
These metadata fields will only be filled in when using a top-level
URL, because that's the only place this information is available. Using
a Foolslide URL (1) will leave these fields empty.

(1) https://hentai.cafe/manga/read/.../en/0/1/"
2019-06-18 14:30:26 +02:00
Mike Fährmann
b51baa9a4b [hitomi] fix empty language detection; parse datetime 2019-06-17 20:02:58 +02:00
Mike Fährmann
258e8b2060 [deviantart] small code improvements 2019-06-17 19:49:50 +02:00
Mike Fährmann
a77340c647 [keenspot] fix extraction for "TwoKinds" 2019-06-17 19:49:39 +02:00
Mike Fährmann
03e6876fbe [instagram] provide 'description' metadata (#310) 2019-06-16 21:54:01 +02:00
Mike Fährmann
ec3e8601f1 [slickpic] add user extractor (#249) 2019-06-14 18:55:56 +02:00
Mike Fährmann
97ef416218 [8muses] support multi-page listings (#305) 2019-06-14 18:48:22 +02:00
Mike Fährmann
f5961ac968 [deviantart] download deviations with no 'content' field
Some deviations (possibly only from sta.sh sources) are downloadable
(i.e. 'is_downloadable' is true and /deviation/download/ works), but
have no 'content' or similar  in their JSON representation.

(fixes #307)
2019-06-13 21:14:12 +02:00
Mike Fährmann
4e07f99e3e [mangoxo] change token message level to debug
The login page currently doesn't provide and require a login token
(logging in works without a token), so printing a warning during
each login is unnecessary.
2019-06-13 21:09:11 +02:00
Mike Fährmann
d997c10320 [8muses] add album extractor (#305) 2019-06-10 22:17:46 +02:00
Mike Fährmann
e05a96db5e [deviantart] rename 'stash' to 'extra' (#302)
'stash' is already used as a name for the StashExtractor and therefore
expected to be a dictionary.
2019-06-10 21:05:25 +02:00
Mike Fährmann
2184e3a86b [slickpic] add album extractor (#249) 2019-06-09 21:59:22 +02:00
Mike Fährmann
c23bf263fe [deviantart] rename 'external' to 'stash' (#302)
restrict extracted URLs to ones from https://sta.sh/...
2019-06-09 11:16:02 +02:00
Mike Fährmann
c73c2cda50 [pornhub] add gallery & user extractor (#282) 2019-06-07 16:31:20 +02:00
Mike Fährmann
7c6cb908f9 [xhamster] update test results 2019-06-07 16:28:49 +02:00
Mike Fährmann
2fb85178da [deviantart] add 'external' option (#302)
If a description is available, this will extract URLs from the
description text and try to find Extractors for them.
2019-06-06 18:53:50 +02:00
Mike Fährmann
f85e42cffc [deviantart] fix --range for deviation & stash extractor 2019-06-06 18:45:10 +02:00
Mike Fährmann
40c7eb3424 [livedoor] improve extraction (fixes #301) 2019-06-06 15:22:27 +02:00
Mike Fährmann
62335b9015 [paheal] adjust test results 2019-06-05 11:42:01 +02:00
Mike Fährmann
aa1ca4ed35 [shopify] skip deleted products (#175)
Product pages which return a 4xx status code will now be skipped instead
of raising an exception.
2019-06-05 11:40:54 +02:00
Mike Fährmann
096009367b [xhamster] add gallery & user extractor (#281) 2019-06-05 11:11:51 +02:00
Mike Fährmann
208202b962 [tumblr] improve error handling (#297)
In some cases Tumblr's API responds with an HTML document.
Trying to decode it as JSON would raise an uncaught exception.
2019-06-04 14:02:17 +02:00
Mike Fährmann
c08c340178 [directlink] make pattern case insensitive (fixes #296) 2019-06-03 10:56:14 +02:00
Mike Fährmann
95b4a53b9c [keenspot] improve pagination (#223)
The old code would skip the last comic page for some series.
2019-06-02 22:12:21 +02:00