Commit Graph

112 Commits

Author SHA1 Message Date
Mike Fährmann
c887493a80 overhaul exception stuff 2019-10-27 23:53:37 +01:00
Mike Fährmann
389d2d7e38 implement 'cookies-update' option (#445) 2019-10-19 15:23:55 +02:00
Mike Fährmann
03bc8adfc7 [postprocessor:exec] run after file moved to target location
(#421)
2019-10-06 23:12:22 +02:00
Mike Fährmann
776e9e073f close archive on job completion (#417) 2019-09-10 22:43:51 +02:00
Mike Fährmann
9178b54eae handle errors when opening download archive file (#417) 2019-09-10 16:44:47 +02:00
Mike Fährmann
682105b8ee prevent crash when loading unavailable downloader (#405) 2019-08-31 21:58:33 +02:00
Mike Fährmann
5f8621b29d improve output of active post processor modules 2019-08-15 13:31:04 +02:00
Mike Fährmann
0bb873757a update PathFormat class
- change 'has_extension' from a simple flag/bool to a field that
  contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306)
2019-08-12 21:40:37 +02:00
Mike Fährmann
8dc42bb178 implement 'enumerate' for 'extractor.skip' (#306)
[ci skip]
2019-08-08 18:37:54 +02:00
Mike Fährmann
20f7b07312 ensure postproc finalize() is called during C-c or crash (#355) 2019-07-27 11:14:52 +02:00
Mike Fährmann
7b77ecc35a fix paths for files without extension (#220) 2019-07-15 16:39:03 +02:00
Mike Fährmann
62097284fe add 'download' option (#220) 2019-07-14 18:48:18 +02:00
Mike Fährmann
fe7805de7c improve attribute access in DownloadJob.handle_url()
Storing a value in a local variable an accessing it that way is faster
than going through 'self' if it is accessed more than once.
2019-07-13 21:42:07 +02:00
Mike Fährmann
f2000a69aa implement 'image-unique' and 'chapter-unique' options (#303)
The default value for both is 'false', i.e. duplicate URLs are NOT
ignored.

The previous behavior was to always ignore duplicate URLs to make
'--abort-on-skip' work properly when new images where added to the
beginning of a collection while gallery-dl is running.
2019-06-29 22:50:17 +02:00
Mike Fährmann
ee4d7c3d89 update downloader.find() and related code
Instead of replacing 'https' with 'http' for every URL in
'get_downloader()', this now only happens once during downloader
initialization. Also unit tests.
2019-06-20 16:59:44 +02:00
Mike Fährmann
523ebc9b0b Fix serialization of 'datetime' objects in '--write-metadata'
Simplified universal serialization support in json.dump() can be achieved
by passing 'default=str', which was already the case in DataJob.run()
for -j/--dump-json, but not for the 'metadata' post-processor.

This commit introduces util.dump_json() that (more or less) unifies the
JSON output procedure of both --write-metadata and --dump-json.

(#251, #252)
2019-05-09 16:49:22 +02:00
Mike Fährmann
b09a8184ca move TestJob into test module; test _extractor values 2019-02-17 18:18:31 +01:00
Mike Fährmann
ae353ed3b0 provide "extractor" and "job" keys for logging output
This allows for stuff like "{extractor.url}" and "{extractor.category}"
in logging format strings.
Accessing 'extractor' and 'job' in any way will return "None" if those
fields aren't defined, i.e. in general logging messages.
2019-02-14 11:09:58 +01:00
Mike Fährmann
89ee8cd7e4 filter "private" kwdict entries 2019-02-13 13:22:11 +01:00
Mike Fährmann
61741d7333 provide type information for Queue messages
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
2019-02-12 21:32:32 +01:00
Mike Fährmann
277b52101a add 'category-transfer' option
[ci skip]
2019-01-19 20:28:19 +01:00
Mike Fährmann
5f38ac9609 [postprocessor:exec] add a better error message (#155) 2019-01-13 13:59:11 +01:00
Mike Fährmann
0225d90078 add exception name and traceback for OSErrors 2018-12-04 19:24:50 +01:00
Mike Fährmann
fb53b5dd55 fix control+c during -j and range tests 2018-11-25 18:54:05 +01:00
Mike Fährmann
13cb270326 set target directory before postprocessor init (fixes #126) 2018-11-21 22:21:26 +01:00
Mike Fährmann
b828473aa3 retry HTTP requests for more exception classes 2018-11-19 15:49:13 +01:00
Mike Fährmann
c47482b110 smaller changes, missing docs, etc.
- make 'netrc' extractor-specific
- rename 'downloader.enable' to 'enabled'
- document 'downloader.ytdl.format'
- consistent newlines in configuration.rst
2018-11-16 18:18:07 +01:00
Mike Fährmann
3c25fa2dad update build_testresult_db.py script 2018-11-15 22:58:14 +01:00
Mike Fährmann
8ef84a6823 add option to enable/disable specific downloader modules
... and write URLs with no (active) downloader to unsupported-file
2018-11-13 18:06:36 +01:00
Mike Fährmann
d3d7f01543 add 'prepare()' step for post-processors
This allows post-processors to modify the destination path before
checking if a file already exists.
2018-10-18 22:32:03 +02:00
Mike Fährmann
6ed629f2b6 allow specifying number of skips before abort/exit (closes #115)
In addition to 'abort' and 'exit', it is now possible to specify
'abort:N' and 'exit:N' (where N is any integer) as value for 'skip'
to abort/exit after consecutively skipping N downloads.
2018-10-13 17:21:55 +02:00
Mike Fährmann
48a8717a7c add 'output.num-to-str' option
... to convert any numeric values to string when outputting them as JSON
(during '--dump-json' or otherwise)
2018-10-08 20:28:54 +02:00
Mike Fährmann
0514d6a0ae make --filter and --range config-file options
The functionality of --(chapter-)filter and --(chapter-)range are now
also exposed as the following config-file options:

- extractor.*.image-filter
- extractor.*.image-range
- extractor.*.chapter-filter
- extractor.*.chapter-range

TODO: update configuration.rst
2018-10-07 21:39:56 +02:00
Mike Fährmann
4a348990f4 adjust value resolution for retries/timeout/verify options
This change introduces 'extractor.*.retries/timeout/verify' options
as a general way to set these values for all HTTP requests.

'downloader.http.retries/timeout/verify' is a way to override these
options for file downloads only and will fall back to 'extractor.*.…*
values if they haven't been explicitly set.

Also: downloader classes now take an extractor object as first argument
instead of a requests.session.
2018-10-07 21:13:39 +02:00
Mike Fährmann
ca6ac4db6a fix 'content' tests 2018-10-05 21:10:33 +02:00
Mike Fährmann
188876d814 implement youtube-dl downloader module
URLs starting with 'ytdl:' will now be handled by youtube-dl.
There is probably a lot to fix and improve, but the basic use case
works.

TODO:
- format selection and ytdl options in general
- better filename/path handling
- ytdl support for "unsupported URLs"
- ...
2018-10-05 18:05:11 +02:00
Mike Fährmann
8c8da11bb8 do not create directory structures when using '-s' 2018-09-21 17:55:04 +02:00
Mike Fährmann
41249f3ead improve extractor.get_downloader() 2018-09-05 18:17:16 +02:00
Mike Fährmann
712b58a93b [postprocessor] add black-/whitelist options
Each post-processor config dict now supports a list of extractor
categories for which it should/shouldn't be active for.

For example:
"postprocessors": [
    {"name": "classify",
     "whitelist": ["tumblr", "deviantart"],
     ...
    }
]
2018-09-03 14:53:43 +02:00
Mike Fährmann
4313c95bc9 improve error message for OAuth2 authentication 2018-08-11 23:54:25 +02:00
Mike Fährmann
973cf98e88 fix download skip for files without extension 2018-06-27 17:16:07 +02:00
Mike Fährmann
2403c405e3 Merge branch 'postprocessor' 2018-06-08 17:43:11 +02:00
Mike Fährmann
baccf8a958 improve postprocessor handling
- add pathfmt argument for __init__()
- add finalization step
- add option to keep or delete zipped files
2018-06-08 17:39:02 +02:00
Mike Fährmann
7646bdbcfd improve postprocessor initialization code 2018-06-07 22:29:54 +02:00
Mike Fährmann
821535b458 adjust PathFormat class 2018-06-06 20:17:17 +02:00
Mike Fährmann
2df1a15fb8 add '-s/--simulate' to run data extraction without download
Useful for quick testing (even though -g and -j kind of do the same)
and to fill a download archive without actually downloading the files.

-s does the same as the default behaviour, except downloading stuff.
Maybe it should get a more fitting name, as it does actually write to
disk (cache, archive)?
2018-05-25 16:07:18 +02:00
Mike Fährmann
76c32d58e5 [postprocessor] initial code 2018-05-22 14:59:22 +02:00
Mike Fährmann
8bf3cdd82b implement logging options
Standard logging to stderr, logfiles, and unsupported URL files (which
are now handled through the logging module) can now be configured by
setting their respective option keys (log, logfile, unsupportedfile)
to a dict and specifying the following options;

- format:
    format string for logging messages
    available keys: see [1]
    default: "[{name}][{levelname}] {message}"
- format-date:
    format string for {asctime} fields in logging messages
    available keys: see [2]
    default: "%Y-%m-%d %H:%M:%S"
- level:
    the lowercase levelname until which the logger should activate;
    available levels are debug, info, warning, error, exception
    default: "info"
- path:
    path of the file to be written to
- mode:
    'mode' argument when opening the specified file
    can be either "w" to truncate the file or "a" to append to it (see [3])

If 'output.log', '.logfile', or '.unsupportedfile' is a string, it will
be interpreted, as it has been, as the filepath
(or as format string for .log)

[1] https://docs.python.org/3/library/logging.html#logrecord-attributes
[2] https://docs.python.org/3/library/time.html#time.strftime
[3] https://docs.python.org/3/library/functions.html#open
2018-05-01 17:54:52 +02:00
Mike Fährmann
9fb82e6b43 apply expand_path() to archive paths 2018-03-08 18:06:39 +01:00
Mike Fährmann
f970a8f13c fix adding keys to download archive when using skip=false 2018-02-13 23:45:30 +01:00