Commit Graph

204 Commits

Author SHA1 Message Date
Mike Fährmann
4235d412c4 implement 'actions'
continuation of d37e7f48
but more versatile and extendable

Example:

"actions": [
    # change debug messages to info
    ["debug", "level ~info"],

    # change exit status to a non-zero value
    ["info:^No results for", "status |= 1"],

    # exit with status 2 on 429
    ["warning:429", "exit 2"],

    # restart extractor when no cookies found
    ["warning:^[Nn]o .*cookies", "restart"]
]
2023-03-10 22:08:10 +01:00
Mike Fährmann
26d06e0bb2 move executable check into util.py 2023-02-28 23:10:23 +01:00
Mike Fährmann
d37e7f4898 add 'hooks' option
Very much a work in progress.

At the moment, it allows to
- wait and restart an extractor (#3338)
- change the exit code (#3630)
- change the log level of a logging message
based on the contents of a logging message
2023-02-13 13:33:42 +01:00
Mike Fährmann
d4232f3a8b implement restarting an extractor (#3338) 2023-02-11 21:06:14 +01:00
Mike Fährmann
5503ac4d5e replace json.dumps with direct calls to JSONEncoder.encode 2023-02-09 15:51:40 +01:00
Mike Fährmann
762a68996b implement 'archive-pragma' option 2023-02-05 17:00:31 +01:00
Mike Fährmann
f58215705a add '-O/--postprocessor-option' command-line option (#3565) 2023-01-26 14:59:24 +01:00
ClosedPort22
b14b33f19e Implement version-metadata option (#3201) 2022-11-27 16:09:42 +01:00
Mike Fährmann
226d778294 do not try to fetch 'http-metadata' for ytdl URLs (#3257) 2022-11-19 11:41:06 +01:00
Mike Fährmann
133412bd62 remove previous 'http-metadata' entries from kwdict 2022-11-19 11:37:57 +01:00
Mike Fährmann
8124c16a50 split 'build_path' from 'set_filename' and 'set_extension'
Do not automatically build a new path
when setting file metadata or updating its extension.
2022-11-08 17:03:24 +01:00
Mike Fährmann
39d9c362e4 include 'http-metadata' in '-K' output 2022-11-07 16:33:26 +01:00
Mike Fährmann
c12a97bcde [postprocessor] add 'post-after' event (#3117) 2022-10-31 14:35:48 +01:00
Mike Fährmann
f037429fa4 attempt to improve '-K' output for lists
- use [N] instead if [] to indicate a Number needs to be placed there
- enumerate list items
2022-10-28 12:04:58 +02:00
pink-red
88f8975ab9 Fix duplicated metadata bug (#3033) 2022-10-13 19:17:23 +02:00
Mike Fährmann
8b1fe0bcf1 emit debug logging messages before calling time.sleep() (#2982) 2022-10-08 15:41:39 +02:00
Mike Fährmann
7d1a95ada6 implement 'path-metadata' option (#2734) 2022-07-30 12:31:45 +02:00
Mike Fährmann
5806a1851e add --no-postprocessors command-line option (#2725) 2022-07-03 12:09:09 +02:00
Mike Fährmann
44ffc017ea remove useless 'tries' argument from out.success 2022-05-24 10:45:09 +02:00
Mike Fährmann
64d3ad2e7a detect circular references with -K (fixes #2609) 2022-05-20 20:47:25 +02:00
Mike Fährmann
688d6553b4 replace calls to print() with stdout_write() (#2529) 2022-05-19 17:09:24 +02:00
Mike Fährmann
71bba774da respect 'output.private' in '-K/--list-keywords' output 2022-03-25 22:19:37 +01:00
Mike Fährmann
9bd27b1b8d [postprocessor:metadata] implement archive options (#2421)
'archive', 'archive-format', and 'archive-prefix'
2022-03-20 21:16:46 +01:00
Mike Fährmann
bb3e182562 overhaul session initialization
- share adapter & connection pool across sessions with the same
  ssl options, ssl ciphers, and source address
- simplify browser emulation to just a list of headers and ciphers
2022-01-31 23:12:08 +01:00
Mike Fährmann
6e0a6c484f apply SPECIAL_EXTRACTORS only for blacklist settings
as was the case before 010d65dc
2022-01-06 21:09:30 +01:00
Mike Fährmann
010d65dcec extend blacklist/whitelist syntax (#2025)
Each entry in such a list can now also include a subcategory
'<category>:<subcategory>'
and it is possible to use '*' or an empty string as placeholder
'*:<subcategory>', ':<subcategory>', '<category>:*'

For example
  "blacklist": "imgur,*:tag,gfycat:user" or
  "blacklist": ["imgur", "*:tag", "gfycat:user"]
will filter all 'imgur' extractors, all extractors  with a 'tag'
subcategory (e.g. https://danbooru.donmai.us/posts?tags=bonocho),
and all 'gfycat' user extractors.
2021-11-23 20:31:43 +01:00
Mike Fährmann
cad85640de move 'util.PathFormat' into its own 'path' module
to prevent circular imports between 'formatter' and 'util'
2021-09-27 21:29:37 +02:00
Mike Fährmann
74145467dd move 'util.Formatter' into its own 'formatter' module 2021-09-27 02:37:04 +02:00
Mike Fährmann
c9e6693530 allow specifying a minimum/maximum for 'sleep-*' options (#1835)
for example '"sleep-request": [5.0, 10.0]' to wait between 5 and 10
seconds between each HTTP request
2021-09-14 17:40:05 +02:00
Mike Fährmann
d79bcb6236 allow extractors to register a 'finalize()' method 2021-09-07 21:15:30 +02:00
Mike Fährmann
72c0cd30c7 do not return with a nonzero exit status when no results found
also change loglevel from 'warning' to 'info'
(#1789)
2021-08-24 18:49:13 +02:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
bdfdabf498 show warning if extractor doesn't yield any results (#1759) 2021-08-16 02:49:36 +02:00
Mike Fährmann
d320ee6251 implement a 'fallback' option (closes #1770) 2021-08-16 01:47:59 +02:00
Mike Fährmann
1b2f9050fb rename all instances of 'kwds' to 'kwdict' 2021-07-20 20:21:19 +02:00
Mike Fährmann
b9783403d9 add 'url-metadata' option (#1659, #1073) 2021-07-14 03:08:49 +02:00
Mike Fährmann
e95f99882f extend 'parent-metadata' functionality (#1687, #1651, #1364) 2021-07-14 02:53:41 +02:00
Mike Fährmann
64986f9435 fix depth counter in UrlJob
regression from adf4d661

It would either stop at the first level (-g) or go infinitely deep (-G)
Going down to for example level 3 with -ggg didn't work.
2021-06-26 00:30:03 +02:00
Mike Fährmann
83fc4c1098 update post processor config capabilities
This change makes it possible to specify just the name of a post processor
in the "postprocessors" list instead of a dict with all of its options.
The options for it will then be taken from inside the "postprocessor"
block similar to "extractor", "downloader", or "output" blocks.

This makes it possible to for example override the default settings for
--write-metadata by specifying a custom "metadata" block, or to set a
custom post processor block ("cbz") and then use it by referencing just
its name in "postprocessors" lists.

{
    "postprocessor":
    {
        "metadata": {
            "name": "metadata",
            "event": "post",
            "filename": "{tweet_id|post_id|id}.json"
        },
        "cbz": {
            "name"       : "zip",
            "compression": "store",
            "extension"  : "cbz"
        }
    }
}
2021-06-05 14:11:16 +02:00
Mike Fährmann
3cbbefd4ed support 'filter' option for post processors (#1460) 2021-06-04 18:23:32 +02:00
Mike Fährmann
adf4d661b3 use '_extractor' info in UrlJobs 2021-05-19 15:52:30 +02:00
Mike Fährmann
b50b8e6cf4 refactor applying 'parent-…' options 2021-05-13 21:56:34 +02:00
Mike Fährmann
7ab8374385 add 'parent-skip' option (#1399) 2021-05-13 16:40:04 +02:00
Mike Fährmann
c693db5b1a add '"skip": "terminate"' option
Stops not only the current extractor/job,
but all parent extractors/jobs as well.
2021-05-12 02:22:28 +02:00
Mike Fährmann
c5ca7905ce add 'noop()' and 'identity()' functions 2021-05-04 19:27:17 +02:00
Mike Fährmann
5b4da4b4bf reorder config access in Job constructor
(#1111)
2021-04-27 15:12:59 +02:00
Mike Fährmann
b4ed7cb961 fix 'category-transfer' (#1111)
broken since commit 055c32e0
2021-04-19 00:55:44 +02:00
Mike Fährmann
a86ffb04bb add 'output.fallback' option
to enable/disable fallback URLs for -g/--get-urls
2021-04-12 02:00:41 +02:00
Mike Fährmann
a75e485461 add archive format to InfoJob output (#875) 2021-04-07 21:50:16 +02:00
Mike Fährmann
bf241811dd allow '_extractor' fields to be None or empty 2021-03-20 01:19:31 +01:00