Commit Graph

37 Commits

Author SHA1 Message Date
Mike Fährmann
9b21d3f13c add '--filter' command-line option
This allows for image filtering via Python expressions by the same
metadata that is also used to build filenames (--list-keywords).

The usually shunned eval() function is used to evaluate
filter-expressions, but it seemed quite appropriate in this case and
shouldn't introduce any new security issues, as any attacker that could do
> gallery-dl --filter "delete-everything()" ...
could as well do
> python -c "delete-everything()"
2017-09-08 17:52:00 +02:00
Mike Fährmann
268cfa3cfe filter duplicate URLs (#36)
Duplicate URLs might occur if, for example,  an artist adds another
image to his gallery while an extractor is running and images are being
downloaded on sites like pixiv/nijie/hentaifoundry.
The next image on the next page will have already been downloaded and
will cause a premature end if '--abort-on-skip' is being used.
2017-09-06 17:08:50 +02:00
Mike Fährmann
47bcf53ec1 implement support for additional unit test result types
- "pattern" matches all resulting URLs against the given regex
- "count" allows to specify the amount of returned URLs
2017-08-25 22:01:14 +02:00
Mike Fährmann
ae2d61e5b3 handle format string exceptions separately 2017-08-11 21:48:37 +02:00
Mike Fährmann
3c9f190757 extend output of --list-keywords 2017-08-10 17:36:21 +02:00
Mike Fährmann
cfa479fab5 update error message for unspecified exceptions
- ask user to report unexpected errors, which usually indicate
  extractor failure
- handle OSErrors separately (permissions, disk full, etc)
- revert 30eef52
2017-08-10 16:35:46 +02:00
Mike Fährmann
915a0137de improve 'extractor.request'
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
2017-08-05 16:11:46 +02:00
Mike Fährmann
58e95a7487 share extractor and downloader sessions
There was never any "good" reason for the strict separation
between extractors and downloaders. This change allows for
reduced resource usage (probably unnoticeable) and less lines
of code at the "cost" of tighter coupling.
2017-06-30 19:38:14 +02:00
Mike Fährmann
c921b4f32a code cleanup and fixing tests 2017-06-02 09:10:58 +02:00
Mike Fährmann
25bcdc8aa9 add --write-unsupported option (#15) 2017-05-27 16:16:57 +02:00
Mike Fährmann
99b72130ee [reddit] enable recursion (#15)
reddit extractors now recursively visit other submissions/posts
linked to in the initial set of submissions.
This behaviour can be configured via the 'extractor.reddit.recursion'
key in the configuration file or by `-o recursion=<value>`.

Example:
{"extractor": {
  "reddit": {
   "recursion": <value>
}}}

Possible values:
* -1 - infinite recursion (don't do this)
*  0 - recursion is disabled (default)
*  1 and higher - maximum recursion level
2017-05-26 17:01:27 +02:00
Mike Fährmann
ae686c4c08 run queue items immediately 2017-05-24 15:15:06 +02:00
Mike Fährmann
30eef527d8 update output logic on error
[ci skip]
2017-05-23 20:12:57 +02:00
Mike Fährmann
e425243b1e [reddit] some small fixes
- filter or complete some URLs
- remove the 'nofollow:' scheme before printing URLs
- (#15)
2017-05-23 11:48:00 +02:00
Mike Fährmann
a90c6acc9c code cleanup + fixes 2017-05-18 15:18:18 +02:00
Mike Fährmann
4c88c0d496 rework the output format for --list-keywords 2017-05-15 18:30:47 +02:00
Mike Fährmann
13dc5d72bc update some extractors to use https 2017-04-20 13:32:40 +02:00
Mike Fährmann
5af35ea150 add -v/--verbose option and reduce error verbosity
(#12)
2017-04-18 11:38:48 +02:00
Mike Fährmann
b43cd88101 add '-j/--dump-json' option
this outputs the extractor-results in JSON format rather then
downloading files
2017-04-12 18:43:41 +02:00
Mike Fährmann
841fd50242 move code into util.py 2017-03-28 13:12:44 +02:00
Mike Fährmann
ed94d9b92d fix/improve various things 2017-03-17 09:39:46 +01:00
Mike Fährmann
27ae152f57 use logging to report errors 2017-03-11 01:47:57 +01:00
Mike Fährmann
7a9d66fbce implement basic way to tell extractors to skip ahead 2017-03-03 17:26:50 +01:00
Mike Fährmann
2fa575b273 restore exception-testing to its old form 2017-02-27 23:05:08 +01:00
Mike Fährmann
40be4933b8 fix exception based tests 2017-02-26 02:06:56 +01:00
Mike Fährmann
24f41e13b3 move some exception handling code 2017-02-25 23:53:31 +01:00
Mike Fährmann
6208d9dd79 implement '--images' and '--chapters' options
- the former '--items' has been renamed to '--chapters'
- #6
2017-02-23 21:51:29 +01:00
Mike Fährmann
2a32b12043 add '--items' option
this allows to specify which manga-chapters/comic-issues to download
when using gallery-dl on a manga/comic URL
2017-02-20 22:02:49 +01:00
Mike Fährmann
3bca866185 rework the '-g' cmdline option
the amount of how often the -g option is given now determines up until
what level URLs are resolved.

example:

$ gallery-dl -g http://kissmanga.com/Manga/Dropout
http://kissmanga.com/Manga/Dropout/Ch-000---Oneshot-?id=145847

- when applied to a manga-extractor, specifying the -g option once will
  now print a list of all chapter URls

$ gallery-dl -gg http://kissmanga.com/Manga/Dropout
http://2.bp.blogspot.com/.../000.png
http://2.bp.blogspot.com/.../001.png
...

- specifying it twice (or even more often) will go a level deeper and
  print the image URLs found in those chapters
2017-02-17 22:18:16 +01:00
Mike Fährmann
4f123b8513 code adjustments according to pep8 2017-01-30 19:40:15 +01:00
Mike Fährmann
29692c5784 get extension from Content-Type header if not provided 2016-09-30 12:32:48 +02:00
Mike Fährmann
1134339c1f Merge branch 'category' 2016-09-25 17:52:55 +02:00
Mike Fährmann
f32cf28758 enable long pathnames on windows (#4) 2016-09-25 09:30:06 +02:00
Mike Fährmann
581daebc4b remove trailing spaces from path segments (#4) 2016-09-24 11:29:25 +02:00
Mike Fährmann
a347d50ef5 add (sub)category keyword automatically 2016-09-24 10:45:11 +02:00
Mike Fährmann
406add217c print urls recursively 2016-08-11 13:20:21 +02:00
Mike Fährmann
6f7f29d684 rename a few files 2016-07-14 14:25:56 +02:00