Mike Fährmann
64d7c85b55
[exhentai] improve metadata
...
- add 'width', 'height' and 'size' (in bytes) for each image
- change the former 'size' and 'size_units' into 'gallery_size'
2018-04-03 18:59:53 +02:00
Mike Fährmann
a112e3f2a0
[nijie] add doujin extractor
...
adds support for "https://nijie.info/members_dojin.php?id= <artist_id>"
2018-03-31 18:17:41 +02:00
Mike Fährmann
299ae24996
[test] add a few downloader tests
2018-03-25 15:10:25 +02:00
Mike Fährmann
dd314279fb
[test] add unit tests for extractor module functions
2018-03-25 11:49:42 +02:00
Mike Fährmann
f5c6a2d7f5
[nhentai] use API to get gallery info
2018-03-21 12:58:41 +01:00
Mike Fährmann
8ef790de12
update .travis.yml
...
- restrict builds to master branch and release tags
- implement 'core' and 'results' test categories
2018-03-19 17:57:32 +01:00
Mike Fährmann
557cb94f81
[deviantart] use proper exponential backoff on API errors
...
... and use separate API credentials for unit tests.
2018-03-15 16:01:42 +01:00
Mike Fährmann
b69cc94f0e
[util] implement bencode()
2018-03-14 13:17:34 +01:00
Mike Fährmann
4d74749496
[tests] rework filters for extractor tests
...
CI incompatible tests will now only be skipped if tests are run in
a CI environment.
2018-03-13 13:11:10 +01:00
Mike Fährmann
32bbd12f08
update extractor tests
2018-03-08 18:04:34 +01:00
Mike Fährmann
749fbbfa6c
[mangadex] add chapter- and manga-extractor
2018-03-05 18:37:21 +01:00
Mike Fährmann
5008e105ee
update archive IDs
...
... to behave in a more straightforward way when dealing with
bookmarks/favourites/etc.
specific IDs are now grouped by their owner, album-id, ... to
allow for duplicates when it would be expected.
2018-03-01 18:20:50 +01:00
Mike Fährmann
2fad0b1f1b
add 'U' conversion for format strings to unquote their content
...
(#74 )
2018-02-25 21:57:59 +01:00
Mike Fährmann
8f338347b6
[imagehosts] cleanup
...
removed
- chronos.to - unable to resolve hostname
- coreimg.net - same
- imgmaid.net - same
- hosturimage.com - everything returns 404
- imageontime.org - redirects to some shady site
- imgupload.yt - cloudflare error 522, host down
- img4ever.net - read timeout
2018-02-23 01:05:42 +01:00
Mike Fährmann
e1e0668ca8
add option to set default replacement field value
...
Missing or undefined keywords will now be replaced with the value
set for 'keywords-default'. The default is Python's 'None', which
is equivalent to setting this option to JSON's 'null'.
2018-02-23 00:59:20 +01:00
Mike Fährmann
ac3da8115e
[util] don't add text: URLs to list of downloaded URLs
2018-02-20 18:14:27 +01:00
Mike Fährmann
89440382ad
[tumblr] use separate API key for unit tests
2018-02-19 16:54:37 +01:00
Mike Fährmann
b50bdbf3d7
change config specifiers in input file format
...
Instead of a dictionary/object, input file options are now specified
by a 'key=value' pair starting with '-' for options only applying to
the next URL or '-G' for Global options applying to all following URLs.
See the docstring of parse_inputfile() for details.
Example option specifiers:
- filename = "{id}.{extension}"
- extractor.pixiv.user.directory = ["Pixiv Users", "{user[id]}"]
-spaces="are_optional"
-G keywords = {"global": "option"}
2018-02-16 03:10:41 +01:00
Mike Fährmann
be3ea4425d
test archive-id creation and uniqueness
2018-02-12 23:02:09 +01:00
Mike Fährmann
b73b8b4f50
add OAuth unittests
2018-02-12 17:07:07 +01:00
Mike Fährmann
f5f2d29f56
[nijie] fix dojin extraction
...
- correctly extract artist_id
- set extension to "jpg" if it was empty and let filetype checks do
the rest
2018-02-09 22:06:26 +01:00
Mike Fährmann
7a412f5c32
implement generic manga-chapter extractor
2018-02-04 22:02:04 +01:00
Mike Fährmann
aa38eab2be
allow not-defined fields in format strings
...
... and replace them with "None", for now
2018-02-03 22:28:41 +01:00
Mike Fährmann
619387cbb1
update extractor unittest results
2018-01-28 18:29:05 +01:00
Mike Fährmann
f94e3706a8
use logging module for error messages during downloads
2018-01-26 18:11:13 +01:00
Mike Fährmann
0dd48d644f
update test results
...
nothing broke, but things got updated or changed
2018-01-23 21:38:29 +01:00
Mike Fährmann
1e93955170
[batoto] remove module
...
Site officially shut down on 2018.01.18
2018-01-23 21:37:32 +01:00
Mike Fährmann
f10ffc0839
update extractor blacklist to also allow classes
2018-01-14 18:47:22 +01:00
Mike Fährmann
35e09869d1
[mangapark] fix image URLs and use HTTPS
2018-01-12 14:59:49 +01:00
Mike Fährmann
4edb25346e
[slideshare] support mobile URLs ( closes #67 )
2018-01-10 14:15:00 +01:00
Mike Fährmann
b33efc99a4
[idolcomplex] add support for idol.sankakucomplex.com
2018-01-09 17:54:37 +01:00
Mike Fährmann
1a70857a12
update extractor-unittest capabilities
...
- "count" can now be a string defining a comparison in the form of
'<operator> <value>', for example: '> 12' or '!= 1'. If its value
is not a string, it is assumed to be a concrete integer as before.
- "keyword" can now be a dictionary defining tests for individual keys.
These tests can either be a type, a concrete value or a regex
starting with "re:". Dictionaries can be stacked inside each other.
Optional keys can be indicated with a "?" before its name.
For example:
"keyword:" {
"image_id": int,
"gallery_id", 123,
"name": "re:pattern",
"user": {
"id": 321,
},
"?optional": None,
}
2017-12-30 19:05:37 +01:00
Mike Fährmann
28cd78aae0
[kissmanga] extend chapter-string regex ( closes #58 )
2017-12-24 22:53:10 +01:00
Mike Fährmann
fc7d165c97
[deviantart] add support for OAuth2 authentication
...
Some user galleries [*] require you to be either logged in or
authenticated via OAuth2 to access their deviations.
[*] e.g. https://polinaegorussia.deviantart.com/gallery/
--------------
known issue:
A deviantart 'refresh_token' can only be used once and gets updated
whenever it is used to request a new 'access_token', so storing its
initial value in a config file and reusing it again and again is not
possible.
2017-12-18 01:16:46 +01:00
Mike Fährmann
0a9a07a6e1
[slideshare] improve metadata; flake8
...
- added 'views' and 'published' keywords
- fixed longer titles and descriptions
2017-12-13 21:16:49 +01:00
Mike Fährmann
291369eab2
various smaller changes/additions
2017-12-06 21:45:56 +01:00
Mike Fährmann
300346ecdf
[mangazuki] remove extractors
...
This site has been in "rebuild"-mode for a fairly long time and the
current extractor code isn't going to work for the new version either.
2017-12-04 13:36:04 +01:00
Mike Fährmann
93482a1f88
implement 'util.advance()'
2017-12-03 01:38:24 +01:00
Mike Fährmann
a718c6c6cd
implement 'util.parse_bytes()'
2017-12-02 01:24:49 +01:00
Mike Fährmann
214972bc9a
[gelbooru] use manual extraction
...
... to compensate for their disabled API.
(https://gelbooru.com/index.php?page=forum&s=view&id=3875 )
This also adds an extractor for image-pools.
2017-11-29 20:48:17 +01:00
Mike Fährmann
b14de6ffc2
[tumblr] small improvements
...
- don't transform inline GIF URLs
- set 'type' parameter for API calls if there is only
one post type selected
2017-11-24 16:51:07 +01:00
Mike Fährmann
b8cdd42cab
[senmanga] fix extraction (again)
...
this is basically a re-revert of 2ace5c7
2017-11-18 17:23:32 +01:00
Mike Fährmann
6913eeaa40
[powermanga] replace manga extractor unit test
...
My Hero Academia is gone
2017-11-15 14:01:24 +01:00
Mike Fährmann
f72318e593
[seiga] support more than 200 images
...
Due to API restrictions and/or missing knowledge about and
documentation of API usage, it was only possible to retrieve the
latest 200 images of a niconico seiga user with said API.
The new approach manually visits each HTML page and gets its
information from there.
2017-11-13 20:46:24 +01:00
Mike Fährmann
2457b71633
skip tests on 5xx status codes
2017-11-12 20:51:12 +01:00
Mike Fährmann
305da540c3
[mangahere] fix metadata extraction
2017-11-03 14:54:46 +01:00
Mike Fährmann
035ef655f1
[imagefap] update unit tests
...
old gallery/image has been deleted
2017-10-27 12:22:16 +02:00
Mike Fährmann
caf26412dd
add option to set alternate location of .part files ( #29 )
...
Note: The path set for 'downloader.*.part-directory' needs to point to an
already existing directory.
2017-10-26 00:16:48 +02:00
Mike Fährmann
27c026543f
re-enable download unit tests
2017-10-25 12:55:36 +02:00
Mike Fährmann
b0353aa02d
rewrite download modules ( #29 )
...
- use '.part' files during file-download
- implement continuation of incomplete downloads
- check if file size matches the one reported by server
2017-10-24 12:53:03 +02:00