Commit Graph

239 Commits

Author SHA1 Message Date
Mike Fährmann
5ab2ae17bc support wildcards for parent>child categories (#6673)
For example "reddit>*" for all reddit child extractors
2024-12-16 08:50:18 +01:00
Mike Fährmann
d8cf381904 [archive] use defaults when 'prefix'/'format' are 'null' 2024-11-29 16:36:35 +01:00
Mike Fährmann
55afd712d6 [pp] allow inheriting settings from global 'postprocessor' entries
No idea how to properly explain/document this, so here's an example:

The extractor.postprocessors object
gets its options from postprocessor.jl
and adds 'filename' itself.

{
    "extractor": {
        "postprocessors": {
            "type": "jl",
            "filename": "meta.jsonl"
        }
    },

    "postprocessor": {
        "jl": {
            "name": "metadata",
            "mode": "jsonl",
            "open": "a"
        }
    }
}
2024-11-16 21:16:13 +01:00
Mike Fährmann
80454460ce [config] support accumulating non-list values
fixes 1264fc518b
2024-11-16 21:13:57 +01:00
Mike Fährmann
1264fc518b allow 'postprocessors' to be a single dict/str
do not require it to be a list with just one element

"postprocessors": "metadata"
"postprocessors": {"name": "metadata"}
2024-11-15 21:15:00 +01:00
Mike Fährmann
5bc3657c59 [util] implement 'compile_filter()' (#5262)
https://github.com/mikf/gallery-dl/issues/5262#issuecomment-2477029728

allow (theoretically*) all filter expression statements
to be a list of individual filters

(*) except for 'filename' and 'directory' conditionals,
as dict keys cannot be lists
2024-11-14 22:47:36 +01:00
Mike Fährmann
2e1dab3036 [pp] add 'error' event 2024-10-19 20:30:34 +02:00
Mike Fährmann
d3dcc44bd1 use child fallbacks only when a non-user error occurs (#6329) 2024-10-17 08:04:41 +02:00
Mike Fährmann
a051e1c955 directly pass exception instances as 'exc_info' logger argument 2024-09-19 14:50:08 +02:00
Mike Fährmann
dd56bb2187 include debug exception info for GalleryDLException errors 2024-09-19 13:51:27 +02:00
Mike Fährmann
8072dcf717 [pp:rename] recheck if file exists only when necessary 2024-09-05 17:42:29 +02:00
Mike Fährmann
359572162b [pp:rename] improve renaming files 'to' a format (#5846, #6044) 2024-09-03 21:17:31 +02:00
Mike Fährmann
8ecd408f53 add '-J/--resolve-json' command-line option (#5864) 2024-07-26 20:41:35 +02:00
Mike Fährmann
84a634fc14 [job] add 'resolve' argument to DataJob (#5864) 2024-07-19 14:32:42 +02:00
Mike Fährmann
f7a6401031 [actions] move LoggerAdapter from 'output' to 'actions' 2024-06-30 20:41:51 +02:00
Mike Fährmann
ea81fa985f [archive] implement 'archive-event' option (#5784)
With this, IDs of skipped files will no longer be written to an archive
by default. Use "archive-event": "file,skip" to restore the previous
behavior.
2024-06-27 22:00:59 +02:00
Mike Fährmann
895e633c44 implement 'keywords-eval' option (#5621)
to allow evaluating 'keywords' values as format strings
2024-05-22 22:53:34 +02:00
Mike Fährmann
d2f50ecf09 add 'skip-filter' option (#5255) 2024-05-10 22:59:52 +02:00
Mike Fährmann
fd734b9222 [archive] add 'archive-mode' option (#5255) 2024-05-10 22:59:51 +02:00
Mike Fährmann
88f94190f4 [archive] move DownloadArchive into its own module 2024-05-10 01:05:28 +02:00
Mike Fährmann
92fbf09643 remove single quotes in some logging messages (#4908)
('FileNotFoundError: [Errno 2] No such file or directory: ''')
->
(FileNotFoundError: [Errno 2] No such file or directory: '')
2023-12-11 19:13:45 +01:00
Mike Fährmann
aea15f6d17 add 'metadata-extractor' option (#4549) 2023-11-20 22:16:15 +01:00
Mike Fährmann
34a387b6e2 support 'metadata-*' names for '*-metadata' options
For example, instead of 'url-metadata' it is now also possible to use
'metadata-url' as option name.

- metadata-url
- metadata-path
- metadata-http
- metadata-version
- metadata-parent
2023-11-18 23:52:10 +01:00
Mike Fährmann
2cd801232b fix --range causing crashes (#4557)
regression caused by a383eca7
2023-09-22 16:28:20 +02:00
Mike Fährmann
7defb24e1e [reddit] provide video previews if available (#4322) 2023-08-28 22:22:10 +02:00
Mike Fährmann
14af15bd18 [reddit] download preview for 404ed imgur links (#4322)
This is a pretty ugly hack as the internal infrastructure doesn't
really support switching from external URL to regular download in
case the former fails, but it kind of works ...

Can be disabled by setting 'reddit.fallback' to 'false'.
2023-08-24 15:41:05 +02:00
Mike Fährmann
92f98e6f5e 'sys.exit' -> 'SystemExit' 2023-08-21 23:46:39 +02:00
Mike Fährmann
f9fb276e81 [postprocessor] add 'prepare-after' event (#4083) 2023-08-10 21:28:48 +02:00
Mike Fährmann
0ef1fcab20 [postprocessor] update 'finalize' events
Add 'finalize-error' and 'finalize-success' events that trigger
depending on whether error(s) did or did not happen.

'finalize' itself now always triggers regardless of error status.
(was supposed to have the same behavior as the new 'finalize-success')
2023-08-10 19:46:37 +02:00
Mike Fährmann
3963dbe5e4 extend 'parent>child' categories
continuation of ed21908f

allow for children to have an arbitrary distance from their parent,
e.g. reddit -> danbooru -> imgur:gallery -> imgur:album
would still be covered by 'reddit>imgur' or even 'danbooru>imgur'
2023-08-07 23:22:12 +02:00
Mike Fährmann
48ef062867 fix issues with 'Extractor.finalize()'
- prevent crash in InstagramUserExtractor (#4359)
- call it at the end of every DownloadJob
- add it to tests
2023-07-29 13:43:27 +02:00
Mike Fährmann
ed21908fda initial support for child extractor options
Using "parent-category>child-category" as extractor category in a config
file allows to set options for a child extractor when it was spawned by
that parent.

For example "reddit>gfycat" to set gfycat options for when it was found
in a reddit post.

{
    "extractor": {
        "gfycat": {
            "filename": "regular filename"
        },
        "reddit>gfycat": {
            "filename": "reddit-specific filename"
        }
    }
}

Note: This does currently not work for most imgur links due to how its
extractor hierarchy is structured.
2023-07-28 17:07:25 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
082d55de16 fix circular reference detection for -K 2023-03-21 23:46:36 +01:00
Mike Fährmann
2ab66ad899 update -K output to include quotes around keys 2023-03-21 22:28:04 +01:00
Mike Fährmann
4235d412c4 implement 'actions'
continuation of d37e7f48
but more versatile and extendable

Example:

"actions": [
    # change debug messages to info
    ["debug", "level ~info"],

    # change exit status to a non-zero value
    ["info:^No results for", "status |= 1"],

    # exit with status 2 on 429
    ["warning:429", "exit 2"],

    # restart extractor when no cookies found
    ["warning:^[Nn]o .*cookies", "restart"]
]
2023-03-10 22:08:10 +01:00
Mike Fährmann
26d06e0bb2 move executable check into util.py 2023-02-28 23:10:23 +01:00
Mike Fährmann
d37e7f4898 add 'hooks' option
Very much a work in progress.

At the moment, it allows to
- wait and restart an extractor (#3338)
- change the exit code (#3630)
- change the log level of a logging message
based on the contents of a logging message
2023-02-13 13:33:42 +01:00
Mike Fährmann
d4232f3a8b implement restarting an extractor (#3338) 2023-02-11 21:06:14 +01:00
Mike Fährmann
5503ac4d5e replace json.dumps with direct calls to JSONEncoder.encode 2023-02-09 15:51:40 +01:00
Mike Fährmann
762a68996b implement 'archive-pragma' option 2023-02-05 17:00:31 +01:00
Mike Fährmann
f58215705a add '-O/--postprocessor-option' command-line option (#3565) 2023-01-26 14:59:24 +01:00
ClosedPort22
b14b33f19e Implement version-metadata option (#3201) 2022-11-27 16:09:42 +01:00
Mike Fährmann
226d778294 do not try to fetch 'http-metadata' for ytdl URLs (#3257) 2022-11-19 11:41:06 +01:00
Mike Fährmann
133412bd62 remove previous 'http-metadata' entries from kwdict 2022-11-19 11:37:57 +01:00
Mike Fährmann
8124c16a50 split 'build_path' from 'set_filename' and 'set_extension'
Do not automatically build a new path
when setting file metadata or updating its extension.
2022-11-08 17:03:24 +01:00
Mike Fährmann
39d9c362e4 include 'http-metadata' in '-K' output 2022-11-07 16:33:26 +01:00
Mike Fährmann
c12a97bcde [postprocessor] add 'post-after' event (#3117) 2022-10-31 14:35:48 +01:00
Mike Fährmann
f037429fa4 attempt to improve '-K' output for lists
- use [N] instead if [] to indicate a Number needs to be placed there
- enumerate list items
2022-10-28 12:04:58 +02:00
pink-red
88f8975ab9 Fix duplicated metadata bug (#3033) 2022-10-13 19:17:23 +02:00