Mike Fährmann
619387cbb1
update extractor unittest results
2018-01-28 18:29:05 +01:00
Mike Fährmann
364e335440
smaller adjustments and improvements
...
- requests and urllib3 version on 1 line
- close input file after reading from it
- use expand_path for unsupported-urls file
- remove unnecessary logging from options.py
2018-01-27 01:05:17 +01:00
Mike Fährmann
c9a9664a65
change --write-log behaviour
...
- log files now get truncated when opening them
(mode "w" instead of "a")
- log verbosity to file depends on -q/-v
(same as logging to stderr)
2018-01-27 00:51:40 +01:00
Mike Fährmann
97f4f15ec0
add option to write logging output to a file
...
- '--write-log FILE' as cmdline argument
- 'output.logfile' as config file option
2018-01-26 18:51:51 +01:00
Mike Fährmann
f94e3706a8
use logging module for error messages during downloads
2018-01-26 18:11:13 +01:00
Mike Fährmann
db91cf871c
document message identifiers
2018-01-23 21:38:30 +01:00
Mike Fährmann
0dd48d644f
update test results
...
nothing broke, but things got updated or changed
2018-01-23 21:38:29 +01:00
Mike Fährmann
1e93955170
[batoto] remove module
...
Site officially shut down on 2018.01.18
2018-01-23 21:37:32 +01:00
Mike Fährmann
27fce6f600
fix UrlJob behavior
2018-01-23 15:42:26 +01:00
Mike Fährmann
76509a6d3c
[imgur] update test results
2018-01-20 18:49:29 +01:00
Mike Fährmann
9fccd7b783
[tumblr] provide fallback URLs ( #64 )
...
Each image now produces 3 URLs:
- amazonaws.com _raw (or _1280 for older images)
- amazonaws.com _500
- media.tumblr.com (URL returned by API)
2018-01-19 23:12:15 +01:00
Mike Fährmann
b837420291
fix minor urllist issues
2018-01-19 22:54:15 +01:00
Mike Fährmann
9d69401391
initial support for multiple URLs per image
2018-01-17 22:08:19 +01:00
Mike Fährmann
6174a5c4ef
[download] adjust filename extension on filetype mismatch
...
(closes #63 )
2018-01-17 18:37:06 +01:00
Mike Fährmann
91ed147cef
[oauth] use custom key/secret values during oauth:…
2018-01-16 17:39:46 +01:00
Mike Fährmann
421a9740a3
[tumblr] add 'tumblr:' to force Tumblr extractor ( #71 )
2018-01-15 18:27:58 +01:00
Mike Fährmann
40d35c87bc
[paheal] add tag- and post-extractors ( closes #69 )
2018-01-15 16:39:05 +01:00
Mike Fährmann
cc0c2cca57
[reddit] add extractor for reddit-hosted images ( closes #68 )
2018-01-14 18:55:42 +01:00
Mike Fährmann
f10ffc0839
update extractor blacklist to also allow classes
2018-01-14 18:47:22 +01:00
Mike Fährmann
b6797032e3
release version 1.1.2
2018-01-12 15:09:18 +01:00
Mike Fährmann
35e09869d1
[mangapark] fix image URLs and use HTTPS
2018-01-12 14:59:49 +01:00
Mike Fährmann
9a049bdf51
[tumblr] add 'likes' extractor ( #65 )
2018-01-12 14:56:01 +01:00
Mike Fährmann
67d4462d26
[batoto] rudimentary Cloudflare bypass
2018-01-11 18:49:19 +01:00
Mike Fährmann
29d75fc3fa
[tumblr] add support for OAuth authentication ( #65 )
2018-01-11 14:11:37 +01:00
Mike Fährmann
4edb25346e
[slideshare] support mobile URLs ( closes #67 )
2018-01-10 14:15:00 +01:00
Mike Fährmann
e420a28bbc
fix cookie tests
2018-01-09 21:43:52 +01:00
Mike Fährmann
b33efc99a4
[idolcomplex] add support for idol.sankakucomplex.com
2018-01-09 17:54:37 +01:00
Mike Fährmann
75b2e84b6d
[tumblr] use s3.amazonaws.com for image URLs ( #64 )
2018-01-09 15:13:00 +01:00
Mike Fährmann
9a8e98f699
add gitter badge to README
2018-01-09 15:10:40 +01:00
Mike Fährmann
5b094328b5
[puremashiro] add chapter- and manga-extractor ( closes #66 )
...
Also adds support for region subtags in language codes (e.g. en-us)
2018-01-07 21:50:43 +01:00
Mike Fährmann
974e73bdbb
[booru] smaller code adjustments
2018-01-06 17:48:49 +01:00
Mike Fährmann
03b8a548cb
[tumblr] change reblogs default value to true ( #61 )
2018-01-06 15:52:08 +01:00
Mike Fährmann
d235f68f59
[tumblr] add option to filter reblogged posts ( #61 )
...
Reblogs are ignored by default, but can be included by setting
'extractor.tumblr.reblogs' to 'true'.
2018-01-05 13:05:57 +01:00
Mike Fährmann
a794fffc6d
[batoto] extend chapter-string regex ( closes #60 )
...
Non-numeric chapter indices exist after all ...
2018-01-05 12:53:50 +01:00
Mike Fährmann
1219ebb7f5
[danbooru] use alternate subdomains; support safebooru
2018-01-04 00:51:04 +01:00
Mike Fährmann
9e8a84ab6c
[booru] rewrite using Mixin classes ( #59 )
...
- improved code structure
- improved URL patterns
- better pagination to work around page limits on
- Danbooru
- e621
- 3dbooru
2018-01-04 00:01:39 +01:00
Mike Fährmann
0876541e43
[seiga] update tests
2017-12-30 19:19:36 +01:00
Mike Fährmann
1a70857a12
update extractor-unittest capabilities
...
- "count" can now be a string defining a comparison in the form of
'<operator> <value>', for example: '> 12' or '!= 1'. If its value
is not a string, it is assumed to be a concrete integer as before.
- "keyword" can now be a dictionary defining tests for individual keys.
These tests can either be a type, a concrete value or a regex
starting with "re:". Dictionaries can be stacked inside each other.
Optional keys can be indicated with a "?" before its name.
For example:
"keyword:" {
"image_id": int,
"gallery_id", 123,
"name": "re:pattern",
"user": {
"id": 321,
},
"?optional": None,
}
2017-12-30 19:05:37 +01:00
Mike Fährmann
88bb0798fd
delay initialization of PathFormat objects
...
This allows the DeviantArt group-check to be moved inside the
Extractor.items() method which in turn allows for better exception
handling.
As a new general rule:
Never raise exceptions during extractor initialization.
2017-12-29 22:15:57 +01:00
Mike Fährmann
c24e0e70a7
[pixiv] simplify main loop
2017-12-28 14:13:39 +01:00
Mike Fährmann
c1e331edbb
[mangapark] replace manga test
2017-12-28 13:58:32 +01:00
Mike Fährmann
5488643fac
add requests and urllib3 versions to debug output
2017-12-27 22:12:40 +01:00
Mike Fährmann
9d73ed4772
fix issue with using 'skip()' when a filter is present
...
calling skip() skips over unfiltered items and does not apply
the filter expression to them, which is not what should happen
2017-12-27 22:09:10 +01:00
Mike Fährmann
28cd78aae0
[kissmanga] extend chapter-string regex ( closes #58 )
2017-12-24 22:53:10 +01:00
Mike Fährmann
0ba618dd1a
release version 1.1.1
2017-12-22 17:01:04 +01:00
Mike Fährmann
a3e9b51bea
[imgbox] update test results
...
Image URLs of older galleries have been updated to the new format.
https://i.imgbox.com/qHhw7lpG.png
-->
https://images3.imgbox.com/6d/9a/qHhw7lpG_o.png
2017-12-22 16:09:14 +01:00
Mike Fährmann
d241a0fb60
[util] replace '/' with '\' in base-directory paths
...
... on Windows to have consistent path separators.
2017-12-21 21:56:24 +01:00
Mike Fährmann
d0886f411e
[gelbooru] re-enable API use ( closes #56 )
...
Gelbooru's API allows access to all images and is not restricted
to the first 20000.
This also adds an option to select between API use and manual
information extraction in case their API gets disabled again.
2017-12-21 21:42:40 +01:00
Mike Fährmann
8102aae311
[mangahere] support ".cc" TLD and mobile URLs
2017-12-20 21:34:25 +01:00
Mike Fährmann
676602056c
[reddit] unescape output URLs
2017-12-19 22:22:43 +01:00