Commit Graph

72 Commits

Author SHA1 Message Date
Mike Fährmann
973cf98e88 fix download skip for files without extension 2018-06-27 17:16:07 +02:00
Mike Fährmann
2403c405e3 Merge branch 'postprocessor' 2018-06-08 17:43:11 +02:00
Mike Fährmann
baccf8a958 improve postprocessor handling
- add pathfmt argument for __init__()
- add finalization step
- add option to keep or delete zipped files
2018-06-08 17:39:02 +02:00
Mike Fährmann
7646bdbcfd improve postprocessor initialization code 2018-06-07 22:29:54 +02:00
Mike Fährmann
821535b458 adjust PathFormat class 2018-06-06 20:17:17 +02:00
Mike Fährmann
2df1a15fb8 add '-s/--simulate' to run data extraction without download
Useful for quick testing (even though -g and -j kind of do the same)
and to fill a download archive without actually downloading the files.

-s does the same as the default behaviour, except downloading stuff.
Maybe it should get a more fitting name, as it does actually write to
disk (cache, archive)?
2018-05-25 16:07:18 +02:00
Mike Fährmann
76c32d58e5 [postprocessor] initial code 2018-05-22 14:59:22 +02:00
Mike Fährmann
8bf3cdd82b implement logging options
Standard logging to stderr, logfiles, and unsupported URL files (which
are now handled through the logging module) can now be configured by
setting their respective option keys (log, logfile, unsupportedfile)
to a dict and specifying the following options;

- format:
    format string for logging messages
    available keys: see [1]
    default: "[{name}][{levelname}] {message}"
- format-date:
    format string for {asctime} fields in logging messages
    available keys: see [2]
    default: "%Y-%m-%d %H:%M:%S"
- level:
    the lowercase levelname until which the logger should activate;
    available levels are debug, info, warning, error, exception
    default: "info"
- path:
    path of the file to be written to
- mode:
    'mode' argument when opening the specified file
    can be either "w" to truncate the file or "a" to append to it (see [3])

If 'output.log', '.logfile', or '.unsupportedfile' is a string, it will
be interpreted, as it has been, as the filepath
(or as format string for .log)

[1] https://docs.python.org/3/library/logging.html#logrecord-attributes
[2] https://docs.python.org/3/library/time.html#time.strftime
[3] https://docs.python.org/3/library/functions.html#open
2018-05-01 17:54:52 +02:00
Mike Fährmann
9fb82e6b43 apply expand_path() to archive paths 2018-03-08 18:06:39 +01:00
Mike Fährmann
f970a8f13c fix adding keys to download archive when using skip=false 2018-02-13 23:45:30 +01:00
Mike Fährmann
be3ea4425d test archive-id creation and uniqueness 2018-02-12 23:02:09 +01:00
Mike Fährmann
3cec533c28 Merge branch 'archive' 2018-02-12 18:07:58 +01:00
Mike Fährmann
4d2fadfb6f restore skip actions with download archive 2018-02-12 16:56:45 +01:00
Mike Fährmann
7f7c16ae37 add option to specify additional key-value pairs 2018-02-08 23:10:58 +01:00
Mike Fährmann
8c3b713362 rework DownloadJob.handle_url(); include archive functionality
todo:
"abort" and "exit" skip modes if download is skipped because of archive
2018-02-01 20:49:41 +01:00
Mike Fährmann
db7f04dd97 emit log messages on download failure
and when retrying with fallback URLs
2018-01-28 18:44:10 +01:00
Mike Fährmann
27fce6f600 fix UrlJob behavior 2018-01-23 15:42:26 +01:00
Mike Fährmann
b837420291 fix minor urllist issues 2018-01-19 22:54:15 +01:00
Mike Fährmann
9d69401391 initial support for multiple URLs per image 2018-01-17 22:08:19 +01:00
Mike Fährmann
6174a5c4ef [download] adjust filename extension on filetype mismatch
(closes #63)
2018-01-17 18:37:06 +01:00
Mike Fährmann
1a70857a12 update extractor-unittest capabilities
- "count" can now be a string defining a comparison in the form of
  '<operator> <value>', for example: '> 12' or '!= 1'. If its value
  is not a string, it is assumed to be a concrete integer as before.

- "keyword" can now be a dictionary defining tests for individual keys.
  These tests can either be a type, a concrete value or a regex
  starting with "re:". Dictionaries can be stacked inside each other.
  Optional keys can be indicated with a "?" before its name.

  For example:
      "keyword:" {
          "image_id": int,
          "gallery_id", 123,
          "name": "re:pattern",
          "user": {
              "id": 321,
          },
          "?optional": None,
      }
2017-12-30 19:05:37 +01:00
Mike Fährmann
88bb0798fd delay initialization of PathFormat objects
This allows the DeviantArt group-check to be moved inside the
Extractor.items() method which in turn allows for better exception
handling.

As a new general rule:
Never raise exceptions during extractor initialization.
2017-12-29 22:15:57 +01:00
Mike Fährmann
9d73ed4772 fix issue with using 'skip()' when a filter is present
calling skip() skips over unfiltered items and does not apply
the filter expression to them, which is not what should happen
2017-12-27 22:09:10 +01:00
Mike Fährmann
291369eab2 various smaller changes/additions 2017-12-06 21:45:56 +01:00
Mike Fährmann
4fb6803fa6 add option to sleep before each download 2017-12-04 17:33:10 +01:00
Mike Fährmann
6c9da67581 apply selection options (filter, range) when using '-j' 2017-11-18 17:35:57 +01:00
Mike Fährmann
27c026543f re-enable download unit tests 2017-10-25 12:55:36 +02:00
Mike Fährmann
2e982f56af use 'Content-Length' to determine incomplete downloads (#29) 2017-10-20 18:56:18 +02:00
Mike Fährmann
2ef3c35c98 smaller textual changes
- swapped doc for deviantart.mature and .original
- updated gallery-dl.conf
- "transferred" -> "delegated"
2017-10-09 23:23:19 +02:00
Mike Fährmann
0386503c80 fix (sub)category-transfer for DownloadJob instances (#41)
... and extend "parent" parameters to TestJob- and DataJob-classes
as well.
2017-10-06 15:38:35 +02:00
Mike Fährmann
b319f4bab3 smaller code and text changes 2017-10-01 18:23:40 +02:00
Mike Fährmann
26a866e7d8 implement (sub)category-transfer between extractors (#41)
ImageFap- and all Manga-Extractors will transfer their (sub)category
values to other extractors instantiated by them, which will in turn
allow those to use options set for their parents.

Example:
ImagefapGalleryExtractors will use options set under
extractor.imagefap.user, if (and only if) they have been instantiated by
a ImagefapUserExtractor; and options from extractor.imagefap.gallery
otherwise.
2017-09-26 21:05:11 +02:00
Mike Fährmann
9c138dfc1f [common] detect empty HTTP response bodies 2017-09-26 16:49:58 +02:00
Mike Fährmann
0dedbe759c enable '--chapter-filter'
The same filter infrastructure that can be applied to image URLS now
also works for manga chapters and other delegated URLs.

TODO: actually provide any metadata (currently supported is only
deviantart and imagefap).
2017-09-12 16:19:00 +02:00
Mike Fährmann
5704c709fa apply filter before range 2017-09-09 14:51:31 +02:00
Mike Fährmann
9b21d3f13c add '--filter' command-line option
This allows for image filtering via Python expressions by the same
metadata that is also used to build filenames (--list-keywords).

The usually shunned eval() function is used to evaluate
filter-expressions, but it seemed quite appropriate in this case and
shouldn't introduce any new security issues, as any attacker that could do
> gallery-dl --filter "delete-everything()" ...
could as well do
> python -c "delete-everything()"
2017-09-08 17:52:00 +02:00
Mike Fährmann
268cfa3cfe filter duplicate URLs (#36)
Duplicate URLs might occur if, for example,  an artist adds another
image to his gallery while an extractor is running and images are being
downloaded on sites like pixiv/nijie/hentaifoundry.
The next image on the next page will have already been downloaded and
will cause a premature end if '--abort-on-skip' is being used.
2017-09-06 17:08:50 +02:00
Mike Fährmann
47bcf53ec1 implement support for additional unit test result types
- "pattern" matches all resulting URLs against the given regex
- "count" allows to specify the amount of returned URLs
2017-08-25 22:01:14 +02:00
Mike Fährmann
ae2d61e5b3 handle format string exceptions separately 2017-08-11 21:48:37 +02:00
Mike Fährmann
3c9f190757 extend output of --list-keywords 2017-08-10 17:36:21 +02:00
Mike Fährmann
cfa479fab5 update error message for unspecified exceptions
- ask user to report unexpected errors, which usually indicate
  extractor failure
- handle OSErrors separately (permissions, disk full, etc)
- revert 30eef52
2017-08-10 16:35:46 +02:00
Mike Fährmann
915a0137de improve 'extractor.request'
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
2017-08-05 16:11:46 +02:00
Mike Fährmann
58e95a7487 share extractor and downloader sessions
There was never any "good" reason for the strict separation
between extractors and downloaders. This change allows for
reduced resource usage (probably unnoticeable) and less lines
of code at the "cost" of tighter coupling.
2017-06-30 19:38:14 +02:00
Mike Fährmann
c921b4f32a code cleanup and fixing tests 2017-06-02 09:10:58 +02:00
Mike Fährmann
25bcdc8aa9 add --write-unsupported option (#15) 2017-05-27 16:16:57 +02:00
Mike Fährmann
99b72130ee [reddit] enable recursion (#15)
reddit extractors now recursively visit other submissions/posts
linked to in the initial set of submissions.
This behaviour can be configured via the 'extractor.reddit.recursion'
key in the configuration file or by `-o recursion=<value>`.

Example:
{"extractor": {
  "reddit": {
   "recursion": <value>
}}}

Possible values:
* -1 - infinite recursion (don't do this)
*  0 - recursion is disabled (default)
*  1 and higher - maximum recursion level
2017-05-26 17:01:27 +02:00
Mike Fährmann
ae686c4c08 run queue items immediately 2017-05-24 15:15:06 +02:00
Mike Fährmann
30eef527d8 update output logic on error
[ci skip]
2017-05-23 20:12:57 +02:00
Mike Fährmann
e425243b1e [reddit] some small fixes
- filter or complete some URLs
- remove the 'nofollow:' scheme before printing URLs
- (#15)
2017-05-23 11:48:00 +02:00
Mike Fährmann
a90c6acc9c code cleanup + fixes 2017-05-18 15:18:18 +02:00