Commit Graph

1715 Commits

Author SHA1 Message Date
Leonardo Taccari
39cd389679 [webtoons] Add a new extractor for webtoons.com (#761)
The webtoons extractor can extract episode and entire comic (all
episodes) from webtoons.com.

All the logic of the extractors should be trivial except for a couple
of kludges needed:

 - `ageGatePass' cookie is always set to avoid possible redirect and stop of
    extraction, especially in the comic extractor
 - The image URLs returned by the episode extractor could not be fetched
   directly and the `Referer:' HTTP header needs to be passed to fetch them

Close #593.
2020-05-18 19:04:20 +02:00
Bepis
7b5711ee04 [imagechest] Add new extractor for ImageChest (#750)
* [imagechest] Add new extractor for ImageChest

* [imagechest] Fix flake8 compliance issues
2020-05-18 19:02:56 +02:00
Mike Fährmann
a1e739b96c reuse connection adapters from parent extractors 2020-05-12 23:52:01 +02:00
Mike Fährmann
f8f95e68a7 improve '--write-pages' (#737)
- move code into its own function
- add enumeration index to filenames
- dump responses regardless of status code
2020-05-12 20:40:25 +02:00
Mike Fährmann
09cc9dbec0 prevent flake8 errors from comments looking like type annotations 2020-05-12 20:08:05 +02:00
Mike Fährmann
2d6724180b [hiperdex] update domain to hiperdex.info 2020-05-12 17:00:51 +02:00
Vrihub
4cc761c730 Implement --write-pages option (#736)
* Implement --write-pages option

* Fix long lines

* Fix file mode to binary

* Fix pattern for Windows compatibility
2020-05-12 14:25:21 +02:00
Mike Fährmann
f557cac074 [redgifs] add image extractor (#724) 2020-05-10 00:31:42 +02:00
Mike Fährmann
65b1cb7acd [deviantart] use private access tokens for Journals (fixes #738) 2020-05-08 21:45:01 +02:00
Mike Fährmann
0bf0146bfe [reddit] don't send OAuth headers for file downloads (fixes #729) 2020-05-08 21:42:52 +02:00
Mike Fährmann
d6a480682f update test results 2020-05-02 21:13:00 +02:00
Leonardo Taccari
b47cfc5ac9 [speakerdeck] Add a new extractor for speakerdeck.com (#726) 2020-05-01 22:32:22 +02:00
Mike Fährmann
90491ab606 [artstation] improve embed extraction (#720) 2020-04-30 21:25:03 +02:00
Mike Fährmann
999efec5cc [deviantart] limit API wait times to 2**9=512 seconds (#721) 2020-04-30 21:16:09 +02:00
Mike Fährmann
504de79d8b [vsco] fix extraction 2020-04-30 21:12:06 +02:00
Mike Fährmann
5e2974d699 [weibo] add 'videos' option 2020-04-30 00:00:30 +02:00
Mike Fährmann
9f638c2e01 [twitter] add 'replies' option (closes #705) 2020-04-29 23:20:06 +02:00
Mike Fährmann
fc3e54275b [patreon] respect filters and sort order in query params (#711) 2020-04-28 23:58:03 +02:00
Mike Fährmann
46b9a4d8ff [patreon] improve hash extraction (#693, #713)
Instead of accessing a specific part of a download URL, potentially
causing an exception if it doesn't exist, we're now searching through
all parts for a potential MD5 hash without ever raising an exception.
2020-04-28 21:47:18 +02:00
Mike Fährmann
c56a751dae [newgrounds] fix URLs produced by 'followng' extractors (#684) 2020-04-28 21:33:37 +02:00
Mike Fährmann
a4fd620a25 [hiperdex] revert domain back to hiperdex.com 2020-04-27 20:42:31 +02:00
Mike Fährmann
233b6f93a2 [patreon] recognize URLs with creator IDs (#711)
e.g. https://www.patreon.com/user/posts?u=…
2020-04-26 22:19:10 +02:00
Mike Fährmann
38b6bd66b0 [500px] match 'web.500px.com' subdomains 2020-04-26 22:17:20 +02:00
Mike Fährmann
d3b3b30107 update test results 2020-04-26 22:14:28 +02:00
Mike Fährmann
5d7ca76885 retry Cloudflare challenges 2020-04-24 22:47:27 +02:00
Mike Fährmann
3eab07739f [twitter] ensure videos have a 'filename'
This usually gets set when invoking the 'ytdl' downloader, but when
that fails, the error message would use 'None' as filename.
2020-04-24 22:34:19 +02:00
Mike Fährmann
c4371a6970 [twitter] add 'reply' metadata field (#705) 2020-04-24 22:31:24 +02:00
Mike Fährmann
12ff23b6cc [mastodon] improve account searches (fixes #704)
Searching for just the username ("@NAME") can produce multiple
unrelated results, so we now search for username + mastodon instance
("@NAME@INSTANCE")
2020-04-23 20:23:10 +02:00
Mike Fährmann
400a0df661 [jaiminisbox] update decoding procedure (fixes #702) 2020-04-23 20:21:48 +02:00
Mike Fährmann
8fe858eb0e improve parameter extraction when solving Cloudflare challenge 2020-04-22 22:08:17 +02:00
Mike Fährmann
fb98b567fa [gelbooru] improve post ID extraction for pools 2020-04-22 21:28:18 +02:00
Mike Fährmann
d6facdee7b [mastodon] add tests (#701) 2020-04-22 21:10:34 +02:00
Mike Fährmann
12eebb6f16 [xhamster] support xhamster.porncache.net domains (closes #700) 2020-04-22 18:31:05 +02:00
Mike Fährmann
e749402191 [mastodon] fix pagination (#701) 2020-04-22 17:58:55 +02:00
Mike Fährmann
921914141e [imgbb] improve redirect handling 2020-04-20 23:36:57 +02:00
Mike Fährmann
6cc800aad4 [instagram] add 'post_id' and 'num' metadata fields (closes #698) 2020-04-20 22:22:29 +02:00
Mike Fährmann
a3de234e70 [hitomi] add extractor for tag searches (closes #697) 2020-04-20 21:55:19 +02:00
Mike Fährmann
456f6e8d05 [nozomi] move '_unpack()' method to global scope 2020-04-20 21:44:16 +02:00
Mike Fährmann
55ac408bdf [hitomi] fix extraction of galleries without tags 2020-04-20 21:42:14 +02:00
Mike Fährmann
db6685eeae [aryion] support downloading from folders (fixes #694) 2020-04-18 01:25:54 +02:00
Mike Fährmann
fa2952ac55 [furaffinity] add 'following' extractor (#515) 2020-04-17 22:18:39 +02:00
Mike Fährmann
9b194520db [newgrounds] add 'following' extractor (closes #684) 2020-04-17 22:17:43 +02:00
Mike Fährmann
6386ee54e1 [deviantart] add extractor info to 'following' results 2020-04-16 23:20:07 +02:00
Mike Fährmann
d5273f9b0c [hiperdex] update domain to hiperdex.net 2020-04-16 20:39:56 +02:00
Mike Fährmann
08674a91f3 [patreon] fix hash extraction from download URLs (closes #693)
The old method was assuming every URL path ends with '/1'. For URLs
where this is not the case, the segment containing the post ID was
used as file hash.
2020-04-15 23:28:57 +02:00
Mike Fährmann
a6286bb551 [hiperdex] add 'artist' extractor (#606) 2020-04-12 02:32:37 +02:00
Mike Fährmann
291033720a [hiperdex] fix manga extraction 2020-04-12 02:27:13 +02:00
Mike Fährmann
dfc0557807 [vsco] fix collection extraction 2020-04-11 23:06:29 +02:00
Mike Fährmann
fd438f0d78 update extractor test results 2020-04-11 23:00:42 +02:00
Mike Fährmann
bae1e8ed12 [deviantart] fix JPEG quality replacement pattern
'q_\d+' would sometimes also replace something in the 'token' query
parameter, invalidating the URL.
2020-04-11 02:37:06 +02:00