21 Commits

Author SHA1 Message Date
Mike Fährmann
2d64e76223 [job] implement 'follow' option (#8752)
Follow and process URLs found in the given format string result.
2026-02-07 21:47:17 +01:00
Mike Fährmann
22b12a1798 [tests:job] test 'parent-metadata' / '_extractor' handling 2026-02-05 22:37:30 +01:00
Mike Fährmann
f046529f28 [tests:job] add tests for DataJob 'resolve' 2026-02-05 22:37:30 +01:00
Mike Fährmann
f0f9575406 [job] fix 'AttributeError' when enabling 'init' for non-DownloadJob
fixes bug in 56dcd00391
2026-02-03 19:00:45 +01:00
Mike Fährmann
3445c51ca4 [job] add 'output.jsonl' option (#8953) 2026-01-30 09:36:28 +01:00
Mike Fährmann
968597a302 yield 3-tuples for Message.Directory
adapt tuples to the same length and semantics as other messages
2025-12-05 21:39:52 +01:00
Mike Fährmann
98d3354575 [wikimedia] implement config lookup for fandom/wikigg sites (#7283)
{
    "extractor": {
        "fandom": {
            "filename": "..."
        }
    }
}
2025-10-23 20:14:56 +02:00
Mike Fährmann
b9429de774 [tests] use f-strings (##7671) 2025-08-14 10:22:42 +02:00
Mike Fährmann
790e097edd [tests:job] update TestDataJob.test_exception result 2025-06-24 18:59:50 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
40bd145637 remove 'contextlib' imports 2024-04-06 16:59:09 +02:00
Mike Fährmann
ba062712ad [tests] '__main__' -> "__main__" 2024-02-27 02:10:05 +01:00
Mike Fährmann
082d55de16 fix circular reference detection for -K 2023-03-21 23:46:36 +01:00
Mike Fährmann
2ab66ad899 update -K output to include quotes around keys 2023-03-21 22:28:04 +01:00
Mike Fährmann
f037429fa4 attempt to improve '-K' output for lists
- use [N] instead if [] to indicate a Number needs to be placed there
- enumerate list items
2022-10-28 12:04:58 +02:00
Mike Fährmann
688d6553b4 replace calls to print() with stdout_write() (#2529) 2022-05-19 17:09:24 +02:00
Mike Fährmann
010d65dcec extend blacklist/whitelist syntax (#2025)
Each entry in such a list can now also include a subcategory
'<category>:<subcategory>'
and it is possible to use '*' or an empty string as placeholder
'*:<subcategory>', ':<subcategory>', '<category>:*'

For example
  "blacklist": "imgur,*:tag,gfycat:user" or
  "blacklist": ["imgur", "*:tag", "gfycat:user"]
will filter all 'imgur' extractors, all extractors  with a 'tag'
subcategory (e.g. https://danbooru.donmai.us/posts?tags=bonocho),
and all 'gfycat' user extractors.
2021-11-23 20:31:43 +01:00
Mike Fährmann
da6806a161 fix job tests for Python 3.4 and 3.5
assert_called() and assert_not_called() got added in Python 3.6
2021-05-22 21:40:52 +02:00
Mike Fährmann
af9dba4684 add DataJob tests 2021-05-21 02:59:54 +02:00
Mike Fährmann
adf4d661b3 use '_extractor' info in UrlJobs 2021-05-19 15:52:30 +02:00
Mike Fährmann
559462789d add some tests for job.py 2021-05-14 19:44:16 +02:00