Commit Graph

978 Commits

Author SHA1 Message Date
Mike Fährmann
a794fffc6d [batoto] extend chapter-string regex (closes #60)
Non-numeric chapter indices exist after all ...
2018-01-05 12:53:50 +01:00
Mike Fährmann
1219ebb7f5 [danbooru] use alternate subdomains; support safebooru 2018-01-04 00:51:04 +01:00
Mike Fährmann
9e8a84ab6c [booru] rewrite using Mixin classes (#59)
- improved code structure
- improved URL patterns
- better pagination to work around page limits on
  - Danbooru
  - e621
  - 3dbooru
2018-01-04 00:01:39 +01:00
Mike Fährmann
0876541e43 [seiga] update tests 2017-12-30 19:19:36 +01:00
Mike Fährmann
1a70857a12 update extractor-unittest capabilities
- "count" can now be a string defining a comparison in the form of
  '<operator> <value>', for example: '> 12' or '!= 1'. If its value
  is not a string, it is assumed to be a concrete integer as before.

- "keyword" can now be a dictionary defining tests for individual keys.
  These tests can either be a type, a concrete value or a regex
  starting with "re:". Dictionaries can be stacked inside each other.
  Optional keys can be indicated with a "?" before its name.

  For example:
      "keyword:" {
          "image_id": int,
          "gallery_id", 123,
          "name": "re:pattern",
          "user": {
              "id": 321,
          },
          "?optional": None,
      }
2017-12-30 19:05:37 +01:00
Mike Fährmann
88bb0798fd delay initialization of PathFormat objects
This allows the DeviantArt group-check to be moved inside the
Extractor.items() method which in turn allows for better exception
handling.

As a new general rule:
Never raise exceptions during extractor initialization.
2017-12-29 22:15:57 +01:00
Mike Fährmann
c24e0e70a7 [pixiv] simplify main loop 2017-12-28 14:13:39 +01:00
Mike Fährmann
c1e331edbb [mangapark] replace manga test 2017-12-28 13:58:32 +01:00
Mike Fährmann
5488643fac add requests and urllib3 versions to debug output 2017-12-27 22:12:40 +01:00
Mike Fährmann
9d73ed4772 fix issue with using 'skip()' when a filter is present
calling skip() skips over unfiltered items and does not apply
the filter expression to them, which is not what should happen
2017-12-27 22:09:10 +01:00
Mike Fährmann
28cd78aae0 [kissmanga] extend chapter-string regex (closes #58) 2017-12-24 22:53:10 +01:00
Mike Fährmann
0ba618dd1a release version 1.1.1 2017-12-22 17:01:04 +01:00
Mike Fährmann
a3e9b51bea [imgbox] update test results
Image URLs of older galleries have been updated to the new format.

https://i.imgbox.com/qHhw7lpG.png
 -->
https://images3.imgbox.com/6d/9a/qHhw7lpG_o.png
2017-12-22 16:09:14 +01:00
Mike Fährmann
d241a0fb60 [util] replace '/' with '\' in base-directory paths
... on Windows to have consistent path separators.
2017-12-21 21:56:24 +01:00
Mike Fährmann
d0886f411e [gelbooru] re-enable API use (closes #56)
Gelbooru's API allows access to all images and is not restricted
to the first 20000.

This also adds an option to select between API use and manual
information extraction in case their API gets disabled again.
2017-12-21 21:42:40 +01:00
Mike Fährmann
8102aae311 [mangahere] support ".cc" TLD and mobile URLs 2017-12-20 21:34:25 +01:00
Mike Fährmann
676602056c [reddit] unescape output URLs 2017-12-19 22:22:43 +01:00
Mike Fährmann
2eedbaaaf9 [deviantart] use cache to store new refresh_tokens
The 'refresh_token' set in a user's config file gets used once to
get a new 'access_token' and 'refresh_token', which is then stored
in gallery-dl's cache and gets used the next time the 'access_token'
needs to be refreshed.

This means deleting the cache file invalidates the refresh_token-
chain and requires the user to re-authenticate.
2017-12-18 13:23:18 +01:00
Mike Fährmann
fc7d165c97 [deviantart] add support for OAuth2 authentication
Some user galleries [*] require you to be either logged in or
authenticated via OAuth2 to access their deviations.

[*] e.g. https://polinaegorussia.deviantart.com/gallery/

--------------

known issue:
A deviantart 'refresh_token' can only be used once and gets updated
whenever it is used to request a new 'access_token', so storing its
initial value in a config file and reusing it again and again is not
possible.
2017-12-18 01:16:46 +01:00
Mike Fährmann
91c2aed077 [nhentai] fix JSON extraction 2017-12-17 17:39:15 +01:00
Mike Fährmann
444008a14a [khinsider] use urljoin() to complete page URLs 2017-12-17 16:21:05 +01:00
Mike Fährmann
263741d243 [luscious] update URL pattern (closes #55) 2017-12-14 14:15:01 +01:00
Mike Fährmann
0a9a07a6e1 [slideshare] improve metadata; flake8
- added 'views' and 'published' keywords
- fixed longer titles and descriptions
2017-12-13 21:16:49 +01:00
Leonardo Taccari
a8d2dde8b2 [slideshare] Add a new extractor for slideshare.net (#54) 2017-12-13 17:38:29 +01:00
Mike Fährmann
19a6ae57b2 [sankaku] add pool extractor 2017-12-12 19:45:10 +01:00
Mike Fährmann
e52f0cc1ed [sankaku] add post extractor 2017-12-12 18:20:15 +01:00
Mike Fährmann
595593a35e [sankaku] rewrite
- better code structure and extensibility
- better metadata
2017-12-12 18:09:45 +01:00
Mike Fährmann
e96e1fea5d release version 1.1.0 2017-12-08 17:15:26 +01:00
Mike Fährmann
a3924d2072 [sankaku] fix swf extraction (closes #52) 2017-12-07 15:45:43 +01:00
Mike Fährmann
ebe9b0a04c another attempt at downloader retry behavior
This commit changes the general behavior from
'Retry on every exception and abort on DownloadError' to
'Only retry on DownloadRetry exceptions and abort on every other one'

The previous version would have retried on several states which
would have no chance of ever succeeding (invalid URLs, etc.)
2017-12-07 15:31:14 +01:00
Mike Fährmann
291369eab2 various smaller changes/additions 2017-12-06 21:45:56 +01:00
Mike Fährmann
4fb6803fa6 add option to sleep before each download 2017-12-04 17:33:10 +01:00
Mike Fährmann
300346ecdf [mangazuki] remove extractors
This site has been in "rebuild"-mode for a fairly long time and the
current extractor code isn't going to work for the new version either.
2017-12-04 13:36:04 +01:00
Mike Fährmann
d275b1d9a3 [khinsider] fix extraction
... again
2017-12-04 12:42:06 +01:00
Mike Fährmann
6b8e3003df [hentai2read] ensure consistent extraction results 2017-12-03 02:34:35 +01:00
Mike Fährmann
a1980b16f3 [gelbooru] various improvements
- better metadata for pools
- map ratings to s/q/e like other boorus do
- skip() support
2017-12-03 01:41:30 +01:00
Mike Fährmann
93482a1f88 implement 'util.advance()' 2017-12-03 01:38:24 +01:00
Mike Fährmann
0e5057b15d remove deprecated options 2017-12-02 15:31:57 +01:00
Mike Fährmann
8f518e03f8 add options to set maximum download rate
- -r/--limit-rate as cmdline option
- downloader.http.rate as config option

This implementation very roughly uses the idea of the token bucket
algorithm [1] and mostly uses Wget's approach [2] as inspiration.

[1] https://en.wikipedia.org/wiki/Token_bucket
[2] http://git.savannah.gnu.org/cgit/wget.git/tree/src/retr.c?h=v1.19.2&id=ba6b44f6745b14dce414761a8e4b35d31b176bba#n111
2017-12-02 01:47:26 +01:00
Mike Fährmann
a718c6c6cd implement 'util.parse_bytes()' 2017-12-02 01:24:49 +01:00
Mike Fährmann
038e3b3369 [kissmanga] handle "AreYouHuman" redirects (#51) 2017-12-01 15:22:50 +01:00
Mike Fährmann
2b9a783fc7 [khinsider] fix extraction 2017-12-01 14:00:37 +01:00
Mike Fährmann
3dc1169736 use own mapping before relying on the 'mimetypes' module 2017-12-01 13:50:31 +01:00
Mike Fährmann
214972bc9a [gelbooru] use manual extraction
... to compensate for their disabled API.
(https://gelbooru.com/index.php?page=forum&s=view&id=3875)

This also adds an extractor for image-pools.
2017-11-29 20:48:17 +01:00
Mike Fährmann
55c64cad4b [khinsider] fix filename extension and test-pattern 2017-11-28 19:35:47 +01:00
Mike Fährmann
c0bcf8e343 release version 1.0.2 2017-11-24 17:24:39 +01:00
Mike Fährmann
28bf25f37d update CHANGELOG 2017-11-24 17:00:45 +01:00
Mike Fährmann
b14de6ffc2 [tumblr] small improvements
- don't transform inline GIF URLs
- set 'type' parameter for API calls if there is only
  one post type selected
2017-11-24 16:51:07 +01:00
Mike Fährmann
9296a26eae [tumblr] add warning messages 2017-11-23 16:12:07 +01:00
Mike Fährmann
65c1c53eb8 [khinsider] fix extraction 2017-11-23 15:33:49 +01:00