Commit Graph

1350 Commits

Author SHA1 Message Date
Mike Fährmann
5cc7be2536 [piczel] update and improve
- use proper pagination (fixes #396)
- update API host and endpoints
- "fix" double slash // in image URLs
2019-08-24 20:37:33 +02:00
Mike Fährmann
49f6d7176d [deviantart] restore filenames (#392)
<title>_by_<user>_<id> --> <title>_by_<user>-<id>
2019-08-23 22:02:03 +02:00
Mike Fährmann
63daa68d67 [deviantart] improvements (#392)
- consistent 'filename' entries, at least as far as possible
  - GIFs and SWFs don't have a <title>_by_<artist>_<id> anywhere in
    their metadata
  - Generating <id> (from 'deviationid'?) might be something that needs
    to be figured out, so we can build those filenames ourselves
- better code structure etc.
- tests for videos, archives, and flash animations
2019-08-23 12:27:19 +02:00
Mike Fährmann
d1db5180ab [simplyhentai] fix extraction; remove image+video extractors 2019-08-22 23:56:41 +02:00
Mike Fährmann
30d6e284b0 [deviantart] use NAPI for artworks and scraps (#392)
TODO:
- journal downloads
- test for all media types
2019-08-21 23:56:06 +02:00
Mike Fährmann
7d6af936c5 [imgur] simplify gallery extraction 2019-08-20 20:00:43 +02:00
Mike Fährmann
51d10783fc [patreon] include image info in API results (#383) 2019-08-18 23:28:47 +02:00
Mike Fährmann
7a5e78741c [booru] build directory path for each file (#385) 2019-08-18 23:28:33 +02:00
Mike Fährmann
b1728f512d [patreon] support multi image posts and post URLs (#383) 2019-08-17 23:24:46 +02:00
Mike Fährmann
c50d60a53d [reactor] fix image URLs 2019-08-16 14:07:22 +02:00
Mike Fährmann
32447d0d24 [pixiv] simplify default filename format
(#366)
2019-08-15 13:32:47 +02:00
Mike Fährmann
829b1ccf04 [imgur] distinguish album and gallery URLs (#380)
A gallery can be either an album or a single image.
2019-08-14 21:40:14 +02:00
Mike Fährmann
23251356cb require 'extension' data for each URL (#382) 2019-08-14 20:03:03 +02:00
Mike Fährmann
a67413d64f [xhamster] use input URL domain
Don't rewrite all URLs as 'https://xhamster.com/...'
2019-08-14 00:21:30 +02:00
Mike Fährmann
423f68f585 [deviantart] fix scraps extraction (closes #376) 2019-08-11 16:06:15 +02:00
Mike Fährmann
3bf20ffb70 [instagram] add support for story highlights 2019-08-10 14:34:22 +02:00
Mike Fährmann
a732e9c430 [instagram] update query hashes and headers 2019-08-10 14:13:08 +02:00
Mike Fährmann
2ccf6a9e35 [instagram] make extractor tests happy (#373) 2019-08-08 18:50:26 +02:00
Leonardo Taccari
bc5eaf7746 [instagram] Add support for IGTV (#373)
Add support for IGTV profile (instagram.com/<username>/channel/)
and IGTV medias (instagram.com/tv/<short_id>).
2019-08-08 18:33:13 +02:00
Mike Fährmann
eb7da159e2 [imagebam] update URL test results
Image URLs are now using https://, but the website itself is still
served as http://.
2019-08-07 21:47:44 +02:00
Mike Fährmann
189acbeac9 [imgbb] add extractor for individual images (closes #363) 2019-08-05 22:52:08 +02:00
Mike Fährmann
ad3ac02fbc [pixiv] update metadata entries (#366)
- change 'num' to a simple enumerating integer
- change default filename format
- provide content of the old 'num' field as 'suffix'
- add 'filename' for ugoira
2019-08-05 22:41:56 +02:00
Mike Fährmann
1ff4c4ec03 [adultempire] consistent artist order 2019-08-05 22:06:11 +02:00
Leonardo Taccari
2df050e627 [instagram] Add support for stories (#371)
* [instagram] Add support for stories

Add support for Instagram user's stories
(https://www.instagram.com/stories/<username>/).

First the shared_data in instagram.com/stories/<username> is fetched in
order to retrieve the user_id that is then passed to fetch the stories
via the corresponding graphql query.

Please note that fetching stories is supported only when authentication
is enabled and the corresponding <username> is followed.

* [instagram] Add an only-matching test for stories

* [instagram] Simplify InstagramExtractor.items() and _extract_stories()

Simplify handling of typename in InstagramExtractor.items() and multi-line
string in _extract_stories().  NFCI.
2019-08-05 22:04:34 +02:00
Mike Fährmann
f4bc75e854 fix rate limit handling for OAuth APIs (#368) 2019-08-03 13:43:00 +02:00
Mike Fährmann
3957d27d79 [deviantart] add 'quality' option (#369) 2019-08-03 11:40:35 +02:00
Mike Fährmann
64b2935d8e [pixiv] provide 'filename' and change default filename format
to '{filename}.{extension}' (closes #366)
2019-08-02 22:35:10 +02:00
Mike Fährmann
fa60109e97 [exhentai] don't use e-hentai.org for exhentai URLs 2019-08-02 21:10:09 +02:00
Mike Fährmann
4a0c98bfc9 miscellaneous fixes and adjustments 2019-08-01 22:09:43 +02:00
Mike Fährmann
2c839f3760 [imgbb] add user extractor + login support (#361) 2019-08-01 21:39:20 +02:00
Mike Fährmann
2153206093 [imgbb] add album extractor (#361) 2019-07-30 23:11:19 +02:00
Mike Fährmann
beb4fab2e6 [exhentai] improve limit and error handling (#360)
- check image limit before opening the first gallery or image page
- prevent any further exhentai extractors from running after the image
  limit has been reached
2019-07-30 22:58:35 +02:00
Mike Fährmann
81b35ed3cb [exhentai] catch more error states (#356, #360)
- warn on MPV-enabled galleries
- catch parsing errors for gallery pages and image info
- write page content to debug output
2019-07-29 16:54:31 +02:00
Mike Fährmann
6ce22f606b [exhentai] update login procedure and tests
Logging in now follows the natural login flow that also happens in a
browser more closely and collects more cookies than just ipb_member_id
and ipb_pass_hash.

Test URLs have been updated and now point to the e-hentai.org domain.
2019-07-28 16:51:05 +02:00
Mike Fährmann
dc73d02d87 [exhentai] always use e-hentai.org as domain + set nw cookie 2019-07-28 10:54:17 +02:00
Mike Fährmann
40637556fa [ngomik] fix extraction 2019-07-28 10:53:46 +02:00
Mike Fährmann
3969f9cbbd [behance] fix collection extraction 2019-07-27 14:26:40 +02:00
Mike Fährmann
17a3426845 [gelbooru] enable all content when not using API 2019-07-27 11:13:38 +02:00
Mike Fährmann
279db2c5b2 [vsco] add collection & image extractor + video support (#331) 2019-07-26 19:06:15 +02:00
Mike Fährmann
d9d44ad953 [tsumino] update test results 2019-07-24 21:17:23 +02:00
Mike Fährmann
60cf40380a [vsco] add user extractor (#331) 2019-07-23 16:23:11 +02:00
Mike Fährmann
3fe5ccdfa6 [adultempire] add gallery extractor (closes #340) 2019-07-21 22:29:57 +02:00
Mike Fährmann
5d968412ca [deviantart] case-insensitive folder name matching (fixes #343) 2019-07-19 18:05:31 +02:00
Mike Fährmann
a3c736fedc [500px] fix extraction
Maximum available image dimensions have been reduced to 4096px
on the longest edge. (from 5000px)
A few (unimportant) metadata fields are no longer available or have
been changed to 'null'.
2019-07-19 17:23:03 +02:00
Mike Fährmann
1133b7fcbd [smugmug] update unit tests
The account used for tests before has been deleted.
2019-07-19 17:16:24 +02:00
Mike Fährmann
21991acc49 add 'ciphers' option; update default User-Agent 2019-07-19 17:14:40 +02:00
Mike Fährmann
84f4d3bc0b replace urllib3's default cipher list with Firefox's (#342)
Avoids Cloudflare CAPTCHAs on both Linux in Windows without
pyOpenSSL installed.
2019-07-18 19:42:13 +02:00
Mike Fährmann
feb98cf196 [twitter] improve 'content' formatting; add option (#338)
- include emoticons
- leave newlines intact
- remove pic.twitter.com/ links at the end
2019-07-17 16:02:51 +02:00
Mike Fährmann
8d1ae9b715 [tumblr] enable date-min/-max/-format options (#337) 2019-07-17 14:36:41 +02:00
Mike Fährmann
09f37fde39 [reddit] move date-min/-max handling into Extractor class 2019-07-16 22:54:39 +02:00