Commit Graph

811 Commits

Author SHA1 Message Date
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
728c64a3fb [tumblr] rename 'offset' to 'num and adjust formats
Trying to somehow emulate Tumblr filenames is a bad idea ...
2018-04-15 18:58:32 +02:00
Mike Fährmann
6bd857a319 [tumblr] handle rate limits / 429 errors
- wait for the hourly limit to reset
- abort upon exceeding the daily limit (it doesn't seem useful to
  potentially wait for several hours)
2018-04-12 16:25:20 +02:00
Mike Fährmann
7073ab7707 [komikcast] update regex to only match manga pages
The 'readerarea' section now includes some (shady) external
Javascript file, which got matched as well.
2018-04-11 15:48:17 +02:00
Mike Fährmann
a1fa4b43b0 Revert "[tumblr] add option to sort photosets by upload order"
This reverts commit 4a26ae32df.
2018-04-09 16:08:08 +02:00
Mike Fährmann
48a83a89e9 [loveisover] remove module
archive.loveisover.me was shut down on 2018-03-29;
https://www.archiveteam.org/index.php?title=4chan#archive.loveisover.me
2018-04-09 16:05:15 +02:00
Mike Fährmann
564e12ca8f replace 'imgyt' with 'imxto'
https://img.yt/ wasn't available for a couple of days, but has now
re-emerged as https://imx.to/ with a new web-interface.
Links to older images still work (see tests).
2018-04-09 15:53:20 +02:00
Mike Fährmann
1b80fa82a9 [imgur] update URL pattern and tests 2018-04-08 21:06:21 +02:00
Mike Fährmann
4a26ae32df [tumblr] add option to sort photosets by upload order 2018-04-07 15:57:55 +02:00
Mike Fährmann
6b72be8ee6 [tumblr] add 'hash' keyword
'hash' is the middle part of the filename in a tumblr image URL.
For example an image with '.../tumblr_p6tgemp1NZ1wgha4yo1_250.png' as
its URL would have 'p6tgemp1NZ1wgha4yo1' as hash.
2018-04-07 15:54:30 +02:00
Mike Fährmann
d11fcf4804 smaller changes and fixes
- fix the cloudflare challenge result if the last decimal places
  are zero (JS`s toFixed() removes trailing zeroes)
- fix downloading of kissmanga chapter-pages hosted on blogspot
  (accessing blogspot with "kissmanga.com" as referrer yields a 401)
- disable certificate validation for 'mangahere' tests
- update flickr test result
2018-04-06 15:30:09 +02:00
Mike Fährmann
759ba26fb0 [luscious] proper image order for picture albums
... and (try) to start with the first image instead of somewhere
in the middle of an album.
2018-04-05 18:12:01 +02:00
Mike Fährmann
68e9fbee16 [tumblr] check all 4 keys/secrets before using OAuth
it was possible to cause a crash by setting api-key or -secret to null.
(this commit also slightly improves the blog-cache implementation)
2018-04-05 15:42:23 +02:00
Mike Fährmann
f8168c693e [tumblr] avoid calls to '/blog/.../info'
The same information returned by the 'blog/.../info' API endpoint
is also included in the result of every 'blog/.../posts' call.
2018-04-04 14:15:24 +02:00
Mike Fährmann
64d7c85b55 [exhentai] improve metadata
- add 'width', 'height' and 'size' (in bytes) for each image
- change the former 'size' and 'size_units' into 'gallery_size'
2018-04-03 18:59:53 +02:00
Mike Fährmann
64b22e0fc1 [pawoo] update URL pattern
adds support for 'https://pawoo.net/@.../media'
2018-04-02 13:00:59 +02:00
Mike Fährmann
7b562907c3 [nijie] add favorites extractor
adds support for 'https://nijie.info/user_like_illust_view.php?id=...'
2018-03-31 18:54:25 +02:00
Mike Fährmann
445db75955 [nijie] improve extraction and metadata
- add 'title' and 'description'
- split 'artist_id' into 'user_id' and 'artist_id'
  - 'user_id' is the ID of the user from which the image entry
    originates from
  - 'artist_id' is the ID of the actual image artist
- improve pagination and URL patterns
2018-03-31 18:48:41 +02:00
Mike Fährmann
a112e3f2a0 [nijie] add doujin extractor
adds support for "https://nijie.info/members_dojin.php?id=<artist_id>"
2018-03-31 18:17:41 +02:00
Mike Fährmann
f39153b6e9 [nhentai] add extractor for search results 2018-03-28 17:21:44 +02:00
Mike Fährmann
52d41c41e7 [exhentai] add extractor for favorited galleries 2018-03-27 18:58:42 +02:00
Mike Fährmann
63cc2599c4 [exhentai] add extractor for search results 2018-03-27 16:50:47 +02:00
Mike Fährmann
d1c91a1f2b [mangadex] fix manga-page extraction 2018-03-25 17:22:12 +02:00
Mike Fährmann
299ae24996 [test] add a few downloader tests 2018-03-25 15:10:25 +02:00
Mike Fährmann
dd314279fb [test] add unit tests for extractor module functions 2018-03-25 11:49:42 +02:00
Mike Fährmann
e7525b1b0e [artstation] add challenge extractor (#80) 2018-03-23 15:06:09 +01:00
Mike Fährmann
f5c6a2d7f5 [nhentai] use API to get gallery info 2018-03-21 12:58:41 +01:00
Mike Fährmann
b2ba2b821d [hitomi] fix image URLs and improve metadata
- use '?a.hitomi.la' as subdomain depending in gallery-id
- add 'characters', 'tags' and 'date' information
- support multiple entires per metadata-value
- rename 'num' to 'page'
2018-03-20 18:09:42 +01:00
Mike Fährmann
3905474805 [booru] call update_page() with correct dict (closes #82) 2018-03-19 11:33:19 +01:00
Mike Fährmann
44c267e362 [artstation] add search extractor (#80) 2018-03-17 19:04:37 +01:00
Mike Fährmann
40ca562d7b [artstation] add album extractor (#80) 2018-03-17 17:36:31 +01:00
Mike Fährmann
f367d5c281 [deviantart] move delay-increase after expect_error check
[ci skip]
2018-03-15 16:44:58 +01:00
Mike Fährmann
557cb94f81 [deviantart] use proper exponential backoff on API errors
... and use separate API credentials for unit tests.
2018-03-15 16:01:42 +01:00
Mike Fährmann
723cc66bb1 [artstation] add user-, image- and likes-extractors 2018-03-14 14:05:14 +01:00
Mike Fährmann
4d74749496 [tests] rework filters for extractor tests
CI incompatible tests will now only be skipped if tests are run in
a CI environment.
2018-03-13 13:11:10 +01:00
Mike Fährmann
d6ef52897c [imgchili] remove module
All previously hosted images yield a 404
and the main page is just a logo.
2018-03-13 11:02:58 +01:00
Mike Fährmann
7847ab1d5a [imagehosts] remove even more dead sites
All removed sites either
- reject all incoming connections or
- display a message from their domain registrar
2018-03-12 21:25:13 +01:00
Mike Fährmann
5f37d40a3e [komikcast] bypass cloudflare challenge 2018-03-10 16:09:40 +01:00
Mike Fährmann
f9884e2338 [pixiv] update URL pattern
add support for 'https://www.pixiv.net/user/<id>'
2018-03-10 16:05:12 +01:00
Mike Fährmann
85ed023c2e [mangadex] remove the trailing ' - MangaDex' in a better way
str.rstrip() works differently than assumed.
2018-03-10 15:54:50 +01:00
Mike Fährmann
32bbd12f08 update extractor tests 2018-03-08 18:04:34 +01:00
Mike Fährmann
ca326bd275 [deviantart] fix folder and collection archive IDs
{folder[index]} and {collection[index]} are both '0' when being
delegated from Gallery- or FavoriteExtractors, as there is no
way of knowing a folder's index when getting folder-information
from the API.
2018-03-08 14:32:23 +01:00
Mike Fährmann
e32fe1cdf1 [pinterest] cast IDs to int
... and update test results.

Image URLs changed from
https://s-media-cache-ak0.pinimg.com/... to
https://i.pinimg.com/...
2018-03-06 14:28:21 +01:00
Mike Fährmann
179ecee965 [turboimagehost] fix extraction 2018-03-06 14:25:10 +01:00
Mike Fährmann
1400868f53 [mangadex] general improvements
- support >100 chapter entries per manga
- custom archive ID format
- detect non-existing chapters
2018-03-06 14:15:15 +01:00
Mike Fährmann
749fbbfa6c [mangadex] add chapter- and manga-extractor 2018-03-05 18:37:21 +01:00
Mike Fährmann
6e38cf5aab [mangareader] use 'https://'
The site now redirects from http://mangareader.net/
to https://mangareader.net/
2018-03-02 17:19:17 +01:00
Mike Fährmann
1d71123f91 [pixiv] update archive IDs and add metadata-fields
(Pixiv bookmarks actually have their own IDs, comments and tags,
independent of the bookmarked image, which makes creating an
archive ID a lot easier)
2018-03-02 16:11:53 +01:00
Mike Fährmann
858fdbdb22 [tumblr] improve 'inline' extraction
'quote' posts store their HTML content in the 'source' field
2018-03-02 06:59:44 +01:00
Mike Fährmann
5008e105ee update archive IDs
... to behave in a more straightforward way when dealing with
bookmarks/favourites/etc.

specific IDs are now grouped by their owner, album-id, ... to
allow for duplicates when it would be expected.
2018-03-01 18:20:50 +01:00