Commit Graph

2168 Commits

Author SHA1 Message Date
Mike Fährmann
b3ecc89a9a [instagram] use double quotes for strings when possible 2020-12-05 19:33:42 +01:00
Mike Fährmann
76285eb60d [instagram] reimplement support for story highlights (#1149) 2020-12-05 19:13:00 +01:00
Mike Fährmann
8ca7f54750 rename '_request_…' variables
- remove '_' at the beginning
- _request_last -> request_timestamp
2020-12-05 00:09:15 +01:00
Mike Fährmann
15a122aff3 [instagram] update 'X-IG-WWW-Claim' headers 2020-12-04 20:58:34 +01:00
Mike Fährmann
e5d81bdc7b [mangadex] handle 'external' chapters (closes #1154) 2020-12-04 20:56:30 +01:00
Mike Fährmann
447488fb18 [instagram] rewrite
(#1113, #1122, #1128, #1130, #1149)

Rely on the results of GraphQL queries instead of requesting data
for each post separately via '/p/<shortcode>/?__a=1'.

This might result in some missing metadata, and there might be some
issues for '/channel/' and '/saved/' URLs, but at least downloading
from the regular post listings should work without issues and without
getting users blocked/banned.

TODO: reimplement support for stories
2020-12-03 14:30:59 +01:00
Mike Fährmann
cc15fbe71a [moebooru] add generalized extractors for moebooru sites
- add support for sakugabooru.com (closes #1136)
- add support for lolibooru.moe   (closes #1050)

This allows users to dynamically add support for moebooru/myimouto
based sites by adding an entry to their config file
(like for foolslide, foolfuuka, etc)

For example:
{
    "extractor": {
        "moebooru": {
            "new-site-1": {"root": "https://site1.net"},
            "new-site-2": {"root": "https://www.site2.moe"}
        }
    }
}
2020-12-01 22:27:18 +01:00
Mike Fährmann
43120407cc [paheal] create directory for each post (closes #1147) 2020-12-01 12:14:55 +01:00
Mike Fährmann
63e61a0932 [twitter] update image URL format (#1145)
use
'/<name>?format=<fmt>&name=<size>'
instead of the potentially deprecated
'/<name>.<fmt>:<size>'

but keep all of them as fallback URLs
2020-12-01 11:53:51 +01:00
Mike Fährmann
ae6a1d5fbc [mangoxo] fix extraction 2 2020-11-27 13:55:30 +01:00
Mike Fährmann
f6a684bc37 [hentainexus] update data decoding procedure (#1125) 2020-11-25 11:26:26 +01:00
Mike Fährmann
c57a918f4a [e621] implement delay via '_request_interval_min' 2020-11-25 00:19:32 +01:00
Mike Fährmann
93ce7466e2 [2chan] skip external links 2020-11-24 16:41:47 +01:00
Mike Fährmann
b214e89b5c [mangoxo] fix extraction 2020-11-24 12:50:46 +01:00
Mike Fährmann
578dcf805c [mangapanda] don't force https:// 2020-11-21 20:24:37 +01:00
Mike Fährmann
102c482f5e [reddit] skip invalid/failed gallery items (fixes #1127) 2020-11-21 17:34:38 +01:00
Mike Fährmann
174945d2b2 [hentainexus] fix extraction (fixes #1125) 2020-11-20 22:31:35 +01:00
Mike Fährmann
1e3dd7330e merge SharedConfigMixin functionality into Extractor 2020-11-17 00:34:07 +01:00
Mike Fährmann
ddfb4fd07a [twitter] use 'https://twitter.com/i/api/' for logged in users
Doesn't seem to make a difference from what I can tell,
i.e. downloaded files are the same, but the website does it.
2020-11-16 11:26:37 +01:00
Mike Fährmann
42ccae53c4 [mangadex] switch to API v2
https://mangadex.org/api/v2/
https://mangadex.org/thread/351011
2020-11-16 11:05:17 +01:00
Mike Fährmann
ca44111726 [flickr] update
- ensure every photo has an 'owner' (#828)
- change default directories to a more consistent schema
- create directory for each photo
2020-11-15 10:44:29 +01:00
Mike Fährmann
de0c57886d [twitter] add 'list-members' extractor (closes #1096) 2020-11-13 06:47:45 +01:00
Mike Fährmann
904ba08568 [gfycat] fix default filename format 2020-11-13 06:37:21 +01:00
Mike Fährmann
a46561bc16 [500px] update query hashes 2020-11-13 06:36:11 +01:00
Mike Fährmann
2e3a0dff21 [8kun] fix file URLs of older posts (fixes #1101) 2020-11-07 23:10:37 +01:00
Mike Fährmann
00825cddf5 [hentaifoundry] use scheme from input URL (fixes #1095)
Let the user choose between http and https,
instead of always forcing https.
2020-11-07 22:40:02 +01:00
Mike Fährmann
8a98d3549a [weasyl] create directory for each favorite submission
(#1032)
2020-11-07 18:47:55 +01:00
Mike Fährmann
91db8df1c7 [deviantart] add 'index_base36' metadata field (closes #1099)
This is the same ID as found in 'filename' without the 'd' in front,
which is just 'index' encoded in base36.
2020-11-07 18:39:50 +01:00
Mike Fährmann
b9bfa4c675 update extractor test results 2020-11-07 02:03:22 +01:00
Mike Fährmann
1b5b789401 [mangoxo] fix metadata extraction 2020-11-07 01:35:29 +01:00
Mike Fährmann
41d4968866 [twitter] add 'list' extractor (#1096) 2020-11-05 22:55:38 +01:00
Mike Fährmann
5d10520f4c [twitter] update GraphQL endpoint & fix width/height entries 2020-11-05 22:53:29 +01:00
Mike Fährmann
9b2e5f72d6 [exhentai] update image URL parsing (#1094) 2020-11-02 15:28:54 +01:00
Mike Fährmann
98a4d86a01 [sankakucomplex] extract videos and embeds (closes #308) 2020-10-30 01:21:11 +01:00
Mike Fährmann
558cde139c [paheal] fix extraction (fixes #1088) 2020-10-28 21:51:31 +01:00
Mike Fährmann
0211af7ca8 [hentaifoundry] update 'YII_CSRF_TOKEN' cookie handling
(fixes #1083)
2020-10-28 21:49:03 +01:00
Mike Fährmann
198c33ec36 also collect post processors from 'basecategory' entries
(fixes #1084)
2020-10-27 19:56:48 +01:00
Mike Fährmann
350b1afe1c speed up _list_classes() after iterating over all modules once 2020-10-26 22:18:15 +01:00
Mike Fährmann
18213dc5ba release version 1.15.2 2020-10-24 18:57:29 +02:00
Mike Fährmann
b788712844 [fallenangels] fix extraction of '.5' chapters 2020-10-23 16:56:08 +02:00
Mike Fährmann
28d8541cb3 [mangafox] ensure download URLs have a scheme 2020-10-23 02:45:15 +02:00
Mike Fährmann
8e3a324c91 [mangakakalot] ignore "Go Home" buttons in chapter pages 2020-10-23 02:33:35 +02:00
Mike Fährmann
c14c5d82d6 [newgrounds] use generator for fallback URLs 2020-10-23 00:39:19 +02:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
1686dc1757 [twitter] support media from Cards (#1005, #937)
Can be enabled with 'extractor.twitter.cards', but for now disabled by
default because cards can redirect to rather large videos from YouTube
or Twitch.
2020-10-22 21:33:53 +02:00
Mike Fährmann
ffd38215a4 [hitomi] fix image URLs and URL pattern
- non-webp files are now hosted on [a-c]b.hitomi.la
- removed ampersand from invalid slug characters
2020-10-22 15:15:34 +02:00
Mike Fährmann
286718950c [mangahere] ensure download URLs have a scheme (fixes #1070) 2020-10-17 22:43:59 +02:00
Mike Fährmann
76dfa11a65 [reddit] add 'date' metadata field (closes #1068) 2020-10-16 15:48:04 +02:00
Mike Fährmann
3f2ba629ea [newgrounds] provide fallback URLs for video downloads (#1042) 2020-10-16 01:16:12 +02:00
Mike Fährmann
a3ca2f6080 update fallback URL handling
remove Message.Urllist and use a '_fallback' field inside a kwdict
2020-10-16 01:09:55 +02:00