Commit Graph

2180 Commits

Author SHA1 Message Date
Mike Fährmann
18213dc5ba release version 1.15.2 2020-10-24 18:57:29 +02:00
Mike Fährmann
b788712844 [fallenangels] fix extraction of '.5' chapters 2020-10-23 16:56:08 +02:00
Mike Fährmann
28d8541cb3 [mangafox] ensure download URLs have a scheme 2020-10-23 02:45:15 +02:00
Mike Fährmann
8e3a324c91 [mangakakalot] ignore "Go Home" buttons in chapter pages 2020-10-23 02:33:35 +02:00
Mike Fährmann
c14c5d82d6 [newgrounds] use generator for fallback URLs 2020-10-23 00:39:19 +02:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
1686dc1757 [twitter] support media from Cards (#1005, #937)
Can be enabled with 'extractor.twitter.cards', but for now disabled by
default because cards can redirect to rather large videos from YouTube
or Twitch.
2020-10-22 21:33:53 +02:00
Mike Fährmann
ffd38215a4 [hitomi] fix image URLs and URL pattern
- non-webp files are now hosted on [a-c]b.hitomi.la
- removed ampersand from invalid slug characters
2020-10-22 15:15:34 +02:00
Mike Fährmann
286718950c [mangahere] ensure download URLs have a scheme (fixes #1070) 2020-10-17 22:43:59 +02:00
Mike Fährmann
76dfa11a65 [reddit] add 'date' metadata field (closes #1068) 2020-10-16 15:48:04 +02:00
Mike Fährmann
3f2ba629ea [newgrounds] provide fallback URLs for video downloads (#1042) 2020-10-16 01:16:12 +02:00
Mike Fährmann
a3ca2f6080 update fallback URL handling
remove Message.Urllist and use a '_fallback' field inside a kwdict
2020-10-16 01:09:55 +02:00
Mike Fährmann
43dab3a228 [mangadex] unescape more metadata fields (fixes #1066)
like 'manga', 'author', 'artist', etc.
2020-10-16 00:41:15 +02:00
Mike Fährmann
5565025221 [xhamster] fix user profile extraction 2020-10-15 18:57:35 +02:00
Mike Fährmann
07432d6262 [seiga] fix flake8 and cookie test (#1063) 2020-10-15 15:37:58 +02:00
Mike Fährmann
b8daabc3ca [pinterest] implement login support (closes #1055)
being logged allows access to secret/protected boards
2020-10-15 15:14:18 +02:00
Mike Fährmann
1b1cf01d0d add a general 'generate_csrf_token()' function 2020-10-15 15:14:18 +02:00
Mike Fährmann
7a0ba370d1 [gelbooru] rewrite mp4 video URLs (fixes #1048) 2020-10-15 15:14:18 +02:00
Mike Fährmann
6491db3eaf [blogger] handle URLs with specified width/height (closes #1061)
get highest quality for images with
/wXXX-hXXX/ instead of the usual /sXXX/
2020-10-15 15:14:18 +02:00
Mike Fährmann
783e0af26d [hentaifoundry] update and simplify 2020-10-15 15:14:17 +02:00
Mike Fährmann
5b844a72b7 [newgrounds] handle embeds without scheme (#1033) 2020-10-15 15:13:54 +02:00
kurumigi
7e0e872f4f [seiga] Add metadata for single image downloads (#1063)
* [seiga] Support image metadata.

* [seiga] Update test data.

* [seiga] Fix cookie check.

* [test_cookies] [seiga] Fit test_cookies.py to the last commit.
2020-10-15 15:13:27 +02:00
Zanny
3ec60e894a [weasyl] api-key authentication (#1057)
* [weasyl] support api keys

* [weasyl] document api-key authentication

* [weasyl] usernames can contain ~
2020-10-15 15:12:09 +02:00
Mike Fährmann
844793847c update extractor test results 2020-10-11 18:15:41 +02:00
Mike Fährmann
ddd6840509 [behance] fix 'collection' extraction 2020-10-11 18:15:41 +02:00
Mike Fährmann
c5e3971b18 [newgrounds] extract image embeds (closes #1033) 2020-10-11 18:15:40 +02:00
dawidsowa
43b156fb40 [reactor] match URLs without subdomain (#1053) 2020-10-11 18:15:06 +02:00
Mike Fährmann
3ebb174f2c add missing extractor info when spawning new ones (fixes #1051)
Not having this information causes the blacklist/whitelist logic to
trigger and prevents things from functioning as intended when using
default settings.

Fixes issues for 8muses, deviantart, exhentai, and mangoxo.
2020-10-08 14:34:53 +02:00
Mike Fährmann
f9c1684af7 [newgrounds] restore original video URLs (#1042) 2020-10-07 22:53:53 +02:00
Mike Fährmann
73373c06ec [weibo] handle posts with more than 9 images (closes #926)
Responses from '/api/container/getIndex' don't list more than
9 images per 'status' object, but the embedded JSON from a
'/detail/<ID>' page does.
2020-10-06 18:16:08 +02:00
Mike Fährmann
dd1e545597 [hentaifoundry] rename GalleryExtractor to PicturesExtractor 2020-10-04 22:53:23 +02:00
Mike Fährmann
c874071f5a [kissmanga] remove module 2020-10-04 22:46:41 +02:00
Mike Fährmann
93e04bf9a9 [500px] update query hashes 2020-10-03 19:25:28 +02:00
Mike Fährmann
844502cad5 update extractor test results 2020-10-03 19:24:19 +02:00
Mike Fährmann
fad7748b6b [xvideos] fix 'title' extraction 2020-10-01 22:04:14 +02:00
Mike Fährmann
5b927c15df [newgrounds] fix video extraction (closes #1042) 2020-10-01 20:14:16 +02:00
Mike Fährmann
bdc6c8f074 improve message for 'oauth:deviantart' etc (closes #989) 2020-09-29 21:25:24 +02:00
Mike Fährmann
430b6d6e2e [twitter] extend 'retweets' option (closes #1026)
Setting 'retweets' to '"original"' will use metadata from the
original retweeted Tweets, and not from the Retweet entry.
2020-09-28 23:03:35 +02:00
Mike Fährmann
b9bdd2c564 [hentaifoundry] add support for stories (closes #734) 2020-09-27 02:27:40 +02:00
Mike Fährmann
9a9d1924d8 [hentaicafe] add 'manga_id' metadata field (closes #1036)
This field is only available when using a non-foolslide URL
like '/hc.fyi/9874' or '/hazuki-yuuto-summer-blues/'
2020-09-26 14:34:48 +02:00
Mike Fährmann
cc4ac80302 [weasyl] add 'favorite' extractor (#1032) 2020-09-26 13:09:03 +02:00
Mike Fährmann
e9cc719497 [weasyl] update and simplify
- simplify 'pattern' regexps
- parse 'posted_at' as 'date'
- use unaltered 'title' ({title!l:R /_/} to lowercase and replace spaces)
2020-09-26 02:10:45 +02:00
Mike Fährmann
6514312126 [nijie] add 'include' option (closes #1018) 2020-09-25 18:18:35 +02:00
Mike Fährmann
0d43456323 [hentaifoundry] add 'include' option 2020-09-25 18:18:03 +02:00
Zanny
ebb7737b9b Weasyl Extractor (#977)
* weasyl extractor

* @kattjevfel suggested changes

* @mikf changes
2020-09-25 15:18:21 +02:00
Mike Fährmann
aeb0d32333 [twitter] improve twitpic extraction (fixes #1019)
- ignore twitpic.com/photos/… URLs
- ignore empty image URLs
2020-09-22 22:22:35 +02:00
Mike Fährmann
7cd383c0f9 update extractor test results 2020-09-20 21:54:39 +02:00
Mike Fährmann
1e313d5b84 implement 'sleep-request' option 2020-09-20 20:28:17 +02:00
Mike Fährmann
c43b3894be [myhentaigallery] update and fix extraction (#1001)
- extract more metadata
- match "/show/" URLs
- complete test results
- fix missing images for lines starting with " <img"
- fix missing comma in supportedsites.py
2020-09-17 18:14:23 +02:00
choeronline
05b9ac8d37 [myhentaigallery] add extractor (#1001)
* adds support for myhentaigallery

* fixes linting issues in myhentaigallery extractor
2020-09-17 17:32:54 +02:00