Commit Graph

1011 Commits

Author SHA1 Message Date
Mike Fährmann
619387cbb1 update extractor unittest results 2018-01-28 18:29:05 +01:00
Mike Fährmann
364e335440 smaller adjustments and improvements
- requests and urllib3 version on 1 line
- close input file after reading from it
- use expand_path for unsupported-urls file
- remove unnecessary logging from options.py
2018-01-27 01:05:17 +01:00
Mike Fährmann
c9a9664a65 change --write-log behaviour
- log files now get truncated when opening them
  (mode "w" instead of "a")
- log verbosity to file depends on -q/-v
  (same  as logging to stderr)
2018-01-27 00:51:40 +01:00
Mike Fährmann
97f4f15ec0 add option to write logging output to a file
- '--write-log FILE' as cmdline argument
- 'output.logfile' as config file option
2018-01-26 18:51:51 +01:00
Mike Fährmann
f94e3706a8 use logging module for error messages during downloads 2018-01-26 18:11:13 +01:00
Mike Fährmann
db91cf871c document message identifiers 2018-01-23 21:38:30 +01:00
Mike Fährmann
0dd48d644f update test results
nothing broke, but things got updated or changed
2018-01-23 21:38:29 +01:00
Mike Fährmann
1e93955170 [batoto] remove module
Site officially shut down on 2018.01.18
2018-01-23 21:37:32 +01:00
Mike Fährmann
27fce6f600 fix UrlJob behavior 2018-01-23 15:42:26 +01:00
Mike Fährmann
76509a6d3c [imgur] update test results 2018-01-20 18:49:29 +01:00
Mike Fährmann
9fccd7b783 [tumblr] provide fallback URLs (#64)
Each image now produces 3 URLs:
- amazonaws.com _raw (or _1280 for older images)
- amazonaws.com _500
- media.tumblr.com (URL returned by API)
2018-01-19 23:12:15 +01:00
Mike Fährmann
b837420291 fix minor urllist issues 2018-01-19 22:54:15 +01:00
Mike Fährmann
9d69401391 initial support for multiple URLs per image 2018-01-17 22:08:19 +01:00
Mike Fährmann
6174a5c4ef [download] adjust filename extension on filetype mismatch
(closes #63)
2018-01-17 18:37:06 +01:00
Mike Fährmann
91ed147cef [oauth] use custom key/secret values during oauth:… 2018-01-16 17:39:46 +01:00
Mike Fährmann
421a9740a3 [tumblr] add 'tumblr:' to force Tumblr extractor (#71) 2018-01-15 18:27:58 +01:00
Mike Fährmann
40d35c87bc [paheal] add tag- and post-extractors (closes #69) 2018-01-15 16:39:05 +01:00
Mike Fährmann
cc0c2cca57 [reddit] add extractor for reddit-hosted images (closes #68) 2018-01-14 18:55:42 +01:00
Mike Fährmann
f10ffc0839 update extractor blacklist to also allow classes 2018-01-14 18:47:22 +01:00
Mike Fährmann
b6797032e3 release version 1.1.2 2018-01-12 15:09:18 +01:00
Mike Fährmann
35e09869d1 [mangapark] fix image URLs and use HTTPS 2018-01-12 14:59:49 +01:00
Mike Fährmann
9a049bdf51 [tumblr] add 'likes' extractor (#65) 2018-01-12 14:56:01 +01:00
Mike Fährmann
67d4462d26 [batoto] rudimentary Cloudflare bypass 2018-01-11 18:49:19 +01:00
Mike Fährmann
29d75fc3fa [tumblr] add support for OAuth authentication (#65) 2018-01-11 14:11:37 +01:00
Mike Fährmann
4edb25346e [slideshare] support mobile URLs (closes #67) 2018-01-10 14:15:00 +01:00
Mike Fährmann
e420a28bbc fix cookie tests 2018-01-09 21:43:52 +01:00
Mike Fährmann
b33efc99a4 [idolcomplex] add support for idol.sankakucomplex.com 2018-01-09 17:54:37 +01:00
Mike Fährmann
75b2e84b6d [tumblr] use s3.amazonaws.com for image URLs (#64) 2018-01-09 15:13:00 +01:00
Mike Fährmann
9a8e98f699 add gitter badge to README 2018-01-09 15:10:40 +01:00
Mike Fährmann
5b094328b5 [puremashiro] add chapter- and manga-extractor (closes #66)
Also adds support for region subtags in language codes (e.g. en-us)
2018-01-07 21:50:43 +01:00
Mike Fährmann
974e73bdbb [booru] smaller code adjustments 2018-01-06 17:48:49 +01:00
Mike Fährmann
03b8a548cb [tumblr] change reblogs default value to true (#61) 2018-01-06 15:52:08 +01:00
Mike Fährmann
d235f68f59 [tumblr] add option to filter reblogged posts (#61)
Reblogs are ignored by default, but can be included by setting
'extractor.tumblr.reblogs' to 'true'.
2018-01-05 13:05:57 +01:00
Mike Fährmann
a794fffc6d [batoto] extend chapter-string regex (closes #60)
Non-numeric chapter indices exist after all ...
2018-01-05 12:53:50 +01:00
Mike Fährmann
1219ebb7f5 [danbooru] use alternate subdomains; support safebooru 2018-01-04 00:51:04 +01:00
Mike Fährmann
9e8a84ab6c [booru] rewrite using Mixin classes (#59)
- improved code structure
- improved URL patterns
- better pagination to work around page limits on
  - Danbooru
  - e621
  - 3dbooru
2018-01-04 00:01:39 +01:00
Mike Fährmann
0876541e43 [seiga] update tests 2017-12-30 19:19:36 +01:00
Mike Fährmann
1a70857a12 update extractor-unittest capabilities
- "count" can now be a string defining a comparison in the form of
  '<operator> <value>', for example: '> 12' or '!= 1'. If its value
  is not a string, it is assumed to be a concrete integer as before.

- "keyword" can now be a dictionary defining tests for individual keys.
  These tests can either be a type, a concrete value or a regex
  starting with "re:". Dictionaries can be stacked inside each other.
  Optional keys can be indicated with a "?" before its name.

  For example:
      "keyword:" {
          "image_id": int,
          "gallery_id", 123,
          "name": "re:pattern",
          "user": {
              "id": 321,
          },
          "?optional": None,
      }
2017-12-30 19:05:37 +01:00
Mike Fährmann
88bb0798fd delay initialization of PathFormat objects
This allows the DeviantArt group-check to be moved inside the
Extractor.items() method which in turn allows for better exception
handling.

As a new general rule:
Never raise exceptions during extractor initialization.
2017-12-29 22:15:57 +01:00
Mike Fährmann
c24e0e70a7 [pixiv] simplify main loop 2017-12-28 14:13:39 +01:00
Mike Fährmann
c1e331edbb [mangapark] replace manga test 2017-12-28 13:58:32 +01:00
Mike Fährmann
5488643fac add requests and urllib3 versions to debug output 2017-12-27 22:12:40 +01:00
Mike Fährmann
9d73ed4772 fix issue with using 'skip()' when a filter is present
calling skip() skips over unfiltered items and does not apply
the filter expression to them, which is not what should happen
2017-12-27 22:09:10 +01:00
Mike Fährmann
28cd78aae0 [kissmanga] extend chapter-string regex (closes #58) 2017-12-24 22:53:10 +01:00
Mike Fährmann
0ba618dd1a release version 1.1.1 2017-12-22 17:01:04 +01:00
Mike Fährmann
a3e9b51bea [imgbox] update test results
Image URLs of older galleries have been updated to the new format.

https://i.imgbox.com/qHhw7lpG.png
 -->
https://images3.imgbox.com/6d/9a/qHhw7lpG_o.png
2017-12-22 16:09:14 +01:00
Mike Fährmann
d241a0fb60 [util] replace '/' with '\' in base-directory paths
... on Windows to have consistent path separators.
2017-12-21 21:56:24 +01:00
Mike Fährmann
d0886f411e [gelbooru] re-enable API use (closes #56)
Gelbooru's API allows access to all images and is not restricted
to the first 20000.

This also adds an option to select between API use and manual
information extraction in case their API gets disabled again.
2017-12-21 21:42:40 +01:00
Mike Fährmann
8102aae311 [mangahere] support ".cc" TLD and mobile URLs 2017-12-20 21:34:25 +01:00
Mike Fährmann
676602056c [reddit] unescape output URLs 2017-12-19 22:22:43 +01:00