Commit Graph

2007 Commits

Author SHA1 Message Date
Mike Fährmann
6cdbab07b5 [pinterest] add support for getting all boards of a user
(#1205)
2020-12-29 16:57:03 +01:00
Mike Fährmann
25074aec47 [twitter] fetch media from pinned tweets (#1203) 2020-12-29 16:27:43 +01:00
Mike Fährmann
2475176d99 [twitter] fetch tweets from 'homeConversation' entries
When logged in, some entries returned by Twitter's API are so called
'homeConversation's (they would be regular tweet entries otherwise.)

Those weren't picked up before and resulted in missing files compared
to accessing a timeline as guest.

('/media' timelines and search results were not affected)
2020-12-29 00:42:46 +01:00
Mike Fährmann
3af9350648 [twitter] update API calls
- use 'https://twitter.com/i/api' for all requests
  except '/guest/activate.json'
- update (default) URL parameters
- update GraphQL endpoints
2020-12-28 22:05:48 +01:00
Mike Fährmann
b656b829db [twitter] fix login with username & password
It is no longer possible to get an 'authenticity_token' from Twitter's
Javascript-free login form, which got disabled few days ago.

Generating a random 16 byte hex string client-side and sending that as
a cookie alongside the regular login form works just as well.
2020-12-28 16:10:19 +01:00
Mike Fährmann
912eea29bc update extractor test results 2020-12-27 17:41:08 +01:00
Mike Fährmann
47a7a51944 [sankaku] fix 'invalid_token' detection 2020-12-27 02:31:01 +01:00
Mike Fährmann
ba5df84f7e [keenspot] improve redirect handling
Before it would use http:// for all requests and
get a redirect to a https:// version if those are supported.

Now the redirect only happens once during the first request.
2020-12-26 21:38:40 +01:00
Mike Fährmann
d781e6ac44 [e621] return pool posts in order (closes #1195)
… and add a 'num' enumeration index.

A bit more code than the PR version, but it prints some helpful messages
and doesn't call 'metadata()' twice.
2020-12-26 19:00:29 +01:00
Mike Fährmann
e7d446a8f7 [danbooru] slight code refactoring 2020-12-25 22:06:25 +01:00
Mike Fährmann
e41e2be2f9 [booru] split '_prepare_post()' 2020-12-24 01:13:54 +01:00
Mike Fährmann
53222445d5 [hentaicafe] simplify default filenames 2020-12-23 01:03:08 +01:00
Mike Fährmann
712c792fbe [hentaicafe] prefer title of /hc.fyi/ pages (closes #1106) 2020-12-23 01:01:15 +01:00
Mike Fährmann
2c4d4a75db [mangadex] respect 'chapter-reverse' settings (closes #1194)
The extractor in question doesn't inherit from MangaExtractor
and therefore didn't do this automatically.
2020-12-22 15:08:10 +01:00
Mike Fährmann
3bd08acc8f [pixiv] output debug message on failed login attempt
(#1192)
2020-12-22 14:59:31 +01:00
Mike Fährmann
b58e605dc7 raise error when required username or password are missing
do not try to login as 'None' (#1192)
2020-12-22 14:40:18 +01:00
Mike Fährmann
b233531aaa [sankaku] use '/posts' endpoint for single posts 2020-12-22 02:44:40 +01:00
Mike Fährmann
459a0af4f8 [sankaku] add support for sankaku.app URLs (closes #1193) 2020-12-22 01:57:53 +01:00
Mike Fährmann
371e9ca6df [pinterest] implement video support (closes #1189) 2020-12-21 16:09:06 +01:00
Mike Fährmann
537742c0ee [sankaku] normalize 'created_at' metadata (closes #1190) 2020-12-21 02:06:29 +01:00
Mike Fährmann
ae6748996a [pornhub] update tests 2020-12-21 02:06:28 +01:00
Mike Fährmann
bf629a2818 [instagram] add 'include' option (closes #1180)
Split the functionality of the old 'user' extractor into separate
'posts' and 'highlights' extractors, which respond to virtual URLs
('/<user>/posts' and '/<user>/highlights')
2020-12-21 02:06:28 +01:00
Mike Fährmann
78061658ea [booru] reduce exceptions caught during _prepare_post()
don't catch HttpErrors etc.
2020-12-21 02:05:59 +01:00
Mike Fährmann
212ae0c399 [mangapanda] remove module
site now redirects to mangareader.net
2020-12-20 17:42:15 +01:00
Mike Fährmann
337b118e25 [instagram] warn about private profiles (#1187) 2020-12-19 22:32:28 +01:00
Mike Fährmann
465015f75a [sankaku] reimplement login support (#1176, #1182) 2020-12-17 16:12:59 +01:00
Mike Fährmann
8d2e4e5f13 [booru] improve error handling
e.g. for posts without a valid 'file_url' (#1176)
2020-12-17 01:16:45 +01:00
Mike Fährmann
1d753542c2 [hentainexus] fix extraction (fixes #1166) 2020-12-12 20:30:51 +01:00
Mike Fährmann
a00b60fbe7 [twitter] update 'x-csrf-token' header (fixes #1170)
Twitter started using a bigger (80 instead of 16 bytes) CSRf token for
logged in users, and expects those to be used as 'x-csrf-token' header
when send via 'ct0' cookie.

Generating an 80 byte token ourselves doesn't work, and Twitter will
still insist on using its own.
2020-12-11 13:46:58 +01:00
Mike Fährmann
b88c97b873 [instagram] add 'cursor' option (#1149)
To enable at least 'some' way to continue downloading from the middle
of a user profile listing.
2020-12-11 13:46:58 +01:00
Mike Fährmann
0d406c8daf [common] restrict values used in 'generate_extractors()' 2020-12-11 13:46:47 +01:00
Mike Fährmann
b2c55f0a72 [sankaku] remove login support
The old login method for 'https://chan.sankakucomplex.com/user/login'
and the cookies it produces have no effect on the results from
'beta.sankakucomplex.com'.
2020-12-08 21:05:47 +01:00
Mike Fährmann
7f3d811d7b [moebooru] inherit from BooruExtractor 2020-12-08 18:34:56 +01:00
Mike Fährmann
a3a863fc13 [booru] add generalized extractors for *booru sites
similar to cc15fbe7
2020-12-08 18:34:30 +01:00
Mike Fährmann
5f23441e12 [piczel] update API URLs 2020-12-07 15:56:32 +01:00
Mike Fährmann
47114339a2 [webtoons] update 'ageGate' cookie 2020-12-07 14:56:32 +01:00
Mike Fährmann
4225f12783 [nozomi] handle empty 'date' fields (fixes #1163) 2020-12-07 00:08:53 +01:00
Mike Fährmann
2b93515ee0 [instagram] reimplement support for stories (#1149) 2020-12-06 21:32:10 +01:00
Mike Fährmann
ecdea799dd [sankaku] use 'beta.sankakucomplex.com' API endpoints 2020-12-05 22:08:58 +01:00
Mike Fährmann
b3ecc89a9a [instagram] use double quotes for strings when possible 2020-12-05 19:33:42 +01:00
Mike Fährmann
76285eb60d [instagram] reimplement support for story highlights (#1149) 2020-12-05 19:13:00 +01:00
Mike Fährmann
8ca7f54750 rename '_request_…' variables
- remove '_' at the beginning
- _request_last -> request_timestamp
2020-12-05 00:09:15 +01:00
Mike Fährmann
15a122aff3 [instagram] update 'X-IG-WWW-Claim' headers 2020-12-04 20:58:34 +01:00
Mike Fährmann
e5d81bdc7b [mangadex] handle 'external' chapters (closes #1154) 2020-12-04 20:56:30 +01:00
Mike Fährmann
447488fb18 [instagram] rewrite
(#1113, #1122, #1128, #1130, #1149)

Rely on the results of GraphQL queries instead of requesting data
for each post separately via '/p/<shortcode>/?__a=1'.

This might result in some missing metadata, and there might be some
issues for '/channel/' and '/saved/' URLs, but at least downloading
from the regular post listings should work without issues and without
getting users blocked/banned.

TODO: reimplement support for stories
2020-12-03 14:30:59 +01:00
Mike Fährmann
cc15fbe71a [moebooru] add generalized extractors for moebooru sites
- add support for sakugabooru.com (closes #1136)
- add support for lolibooru.moe   (closes #1050)

This allows users to dynamically add support for moebooru/myimouto
based sites by adding an entry to their config file
(like for foolslide, foolfuuka, etc)

For example:
{
    "extractor": {
        "moebooru": {
            "new-site-1": {"root": "https://site1.net"},
            "new-site-2": {"root": "https://www.site2.moe"}
        }
    }
}
2020-12-01 22:27:18 +01:00
Mike Fährmann
43120407cc [paheal] create directory for each post (closes #1147) 2020-12-01 12:14:55 +01:00
Mike Fährmann
63e61a0932 [twitter] update image URL format (#1145)
use
'/<name>?format=<fmt>&name=<size>'
instead of the potentially deprecated
'/<name>.<fmt>:<size>'

but keep all of them as fallback URLs
2020-12-01 11:53:51 +01:00
Mike Fährmann
ae6a1d5fbc [mangoxo] fix extraction 2 2020-11-27 13:55:30 +01:00
Mike Fährmann
f6a684bc37 [hentainexus] update data decoding procedure (#1125) 2020-11-25 11:26:26 +01:00