Commit Graph

178 Commits

Author SHA1 Message Date
Mike Fährmann
cad85640de move 'util.PathFormat' into its own 'path' module
to prevent circular imports between 'formatter' and 'util'
2021-09-27 21:29:37 +02:00
Mike Fährmann
74145467dd move 'util.Formatter' into its own 'formatter' module 2021-09-27 02:37:04 +02:00
Mike Fährmann
c9e6693530 allow specifying a minimum/maximum for 'sleep-*' options (#1835)
for example '"sleep-request": [5.0, 10.0]' to wait between 5 and 10
seconds between each HTTP request
2021-09-14 17:40:05 +02:00
Mike Fährmann
d79bcb6236 allow extractors to register a 'finalize()' method 2021-09-07 21:15:30 +02:00
Mike Fährmann
72c0cd30c7 do not return with a nonzero exit status when no results found
also change loglevel from 'warning' to 'info'
(#1789)
2021-08-24 18:49:13 +02:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
bdfdabf498 show warning if extractor doesn't yield any results (#1759) 2021-08-16 02:49:36 +02:00
Mike Fährmann
d320ee6251 implement a 'fallback' option (closes #1770) 2021-08-16 01:47:59 +02:00
Mike Fährmann
1b2f9050fb rename all instances of 'kwds' to 'kwdict' 2021-07-20 20:21:19 +02:00
Mike Fährmann
b9783403d9 add 'url-metadata' option (#1659, #1073) 2021-07-14 03:08:49 +02:00
Mike Fährmann
e95f99882f extend 'parent-metadata' functionality (#1687, #1651, #1364) 2021-07-14 02:53:41 +02:00
Mike Fährmann
64986f9435 fix depth counter in UrlJob
regression from adf4d661

It would either stop at the first level (-g) or go infinitely deep (-G)
Going down to for example level 3 with -ggg didn't work.
2021-06-26 00:30:03 +02:00
Mike Fährmann
83fc4c1098 update post processor config capabilities
This change makes it possible to specify just the name of a post processor
in the "postprocessors" list instead of a dict with all of its options.
The options for it will then be taken from inside the "postprocessor"
block similar to "extractor", "downloader", or "output" blocks.

This makes it possible to for example override the default settings for
--write-metadata by specifying a custom "metadata" block, or to set a
custom post processor block ("cbz") and then use it by referencing just
its name in "postprocessors" lists.

{
    "postprocessor":
    {
        "metadata": {
            "name": "metadata",
            "event": "post",
            "filename": "{tweet_id|post_id|id}.json"
        },
        "cbz": {
            "name"       : "zip",
            "compression": "store",
            "extension"  : "cbz"
        }
    }
}
2021-06-05 14:11:16 +02:00
Mike Fährmann
3cbbefd4ed support 'filter' option for post processors (#1460) 2021-06-04 18:23:32 +02:00
Mike Fährmann
adf4d661b3 use '_extractor' info in UrlJobs 2021-05-19 15:52:30 +02:00
Mike Fährmann
b50b8e6cf4 refactor applying 'parent-…' options 2021-05-13 21:56:34 +02:00
Mike Fährmann
7ab8374385 add 'parent-skip' option (#1399) 2021-05-13 16:40:04 +02:00
Mike Fährmann
c693db5b1a add '"skip": "terminate"' option
Stops not only the current extractor/job,
but all parent extractors/jobs as well.
2021-05-12 02:22:28 +02:00
Mike Fährmann
c5ca7905ce add 'noop()' and 'identity()' functions 2021-05-04 19:27:17 +02:00
Mike Fährmann
5b4da4b4bf reorder config access in Job constructor
(#1111)
2021-04-27 15:12:59 +02:00
Mike Fährmann
b4ed7cb961 fix 'category-transfer' (#1111)
broken since commit 055c32e0
2021-04-19 00:55:44 +02:00
Mike Fährmann
a86ffb04bb add 'output.fallback' option
to enable/disable fallback URLs for -g/--get-urls
2021-04-12 02:00:41 +02:00
Mike Fährmann
a75e485461 add archive format to InfoJob output (#875) 2021-04-07 21:50:16 +02:00
Mike Fährmann
bf241811dd allow '_extractor' fields to be None or empty 2021-03-20 01:19:31 +01:00
Mike Fährmann
23641742a3 improve 'parent-directory' (#1364)
Allow forwarding metadata from the top-level extractor to all children
if 'parent-directory' is enabled for all extractors along the way.

For example 'reddit' -> 'gfycat' -> 'redgifs'
2021-03-14 17:19:57 +01:00
Mike Fährmann
df94182e11 implement 'parent-metadata' option (#1364)
experimental, might not work as expected, etc.
2021-03-11 01:10:34 +01:00
Mike Fährmann
b6719becf1 ensure '-s/--simulate' always prints filenames (#1360)
by assuming a potentially wrong filename extension in cases where the
correct one would only get known after a download started
2021-03-07 22:38:20 +01:00
Mike Fährmann
c963741860 add '-E/--extractor-info' command-line option (#875) 2021-03-02 23:59:56 +01:00
Mike Fährmann
65ca923b4e fix 'whitelist' option for BaseExtractor instances 2021-02-15 21:58:33 +01:00
Mike Fährmann
56a8968435 remove 'Message.Metadata' (#866) 2021-01-31 02:12:37 +01:00
Mike Fährmann
46323ae6ff initialize 'hooks' as empty tuple
follow-up to 9c29fc4e

Prevent a "race" between initializing 'pathfmt' and 'hooks',
and receiving a signal in between (e.g. ctrl+c),
which would then crash in 'handle_finalize()'.
2020-11-28 18:18:49 +01:00
Mike Fährmann
9c29fc4e55 always initialize DownloadJob.hooks (fixes #1135)
and not just when any (potential) post processors are defined
2020-11-28 00:09:19 +01:00
Mike Fährmann
9fffa9c343 rework post processor callbacks 2020-11-19 02:29:06 +01:00
Mike Fährmann
f99c6031e0 apply post processor blacklists/whitelists to basecategories
(#1103)
2020-11-17 02:02:31 +01:00
Mike Fährmann
a3ca2f6080 update fallback URL handling
remove Message.Urllist and use a '_fallback' field inside a kwdict
2020-10-16 01:09:55 +02:00
Mike Fährmann
fd20093c96 allow blacklist/whitelist to be empty lists/strings (#1051) 2020-10-08 14:55:21 +02:00
Mike Fährmann
d5fa716d89 fix crash when using 'skip=false' and archive (fixes #1023)
Separating the archive check from pathfmt.exists() in b5243297
had some unintended side effects.

It is also not possible to monkey-patch a dunder method like
__contains__ because of the special method lookup that gets
performed for them.
2020-09-23 19:07:40 +02:00
Mike Fährmann
231dd4c800 accumulate postprocessor objects (#994)
Instead of one 'postprocessors' setting overwriting all others lower
in the hierarchy, all postprocessors along the config path will now
get collected into one big list.

For example '--mtime-from-date' will therefore no longer cause
other postprocessor settings in a config file to get ignored.
2020-09-14 21:51:55 +02:00
Mike Fährmann
3afd362e2e add 'sleep-extractor' option (closes #964)
(would have been nice if this were possible without code duplication)
2020-09-12 21:04:47 +02:00
Mike Fährmann
c78aa17506 add general 'blacklist' and 'whitelist' options (#492, #844) 2020-09-11 13:17:12 +02:00
Mike Fährmann
5912727b88 support format string replacement fields in archive paths
(closes #985)
2020-09-10 22:09:30 +02:00
Mike Fährmann
b5243297ff write skipped files to archive (closes #550) 2020-09-03 18:37:38 +02:00
Mike Fährmann
3f73cc6855 allow 'parent-directory' to work recursively (fixes #905) 2020-07-29 00:31:23 +02:00
Mike Fährmann
d5bfb0b38c set pseudo extension for Metadata messages (#865)
This prevents pathfmt.filename from potentially being empty.
2020-07-04 22:14:39 +02:00
Mike Fährmann
1b3870a4be flush after writing JSON in DataJob() (#727)
… and remove the dead handle_finalize() method,
which is never called since DataJob() overrides run().
2020-06-19 23:05:44 +02:00
Mike Fährmann
7e8a747c56 improve output of '-K' for parent extractors 2 (#825)
This is what bb882b8 was supposed to be, but I managed to
not include those changes in the first commit …
2020-06-18 15:04:15 +02:00
Mike Fährmann
ece73b5b2a make 'path' and 'keywords' available in logging messages
Wrap all loggers used by job, extractor, downloader, and postprocessor
objects into a (custom) LoggerAdapter that provides access to the
underlying job, extractor, pathfmt, and kwdict objects and their
properties.

__init__() signatures for all downloader and postprocessor classes have
been changed to take the current Job object as their first argument,
instead of the current extractor or pathfmt.

(#574, #575)
2020-05-18 19:04:51 +02:00
Mike Fährmann
a1e739b96c reuse connection adapters from parent extractors 2020-05-12 23:52:01 +02:00
Mike Fährmann
42f29c3e11 improve and simplify attribute access in DownloadJob.initialize() 2020-05-09 00:57:59 +02:00
Mike Fährmann
56f1c96168 implement 'parent-directory' option (#551) 2020-01-29 18:32:37 +01:00