[tiktok] extract subtitles and all cover types (#8805)

* Make sure that `img_id`, `audio_id` and `cover_id` fields are always available.
    The values are set '' where they are not applicable.
    Having `img_id` is necessary for the default `archive_fmt`, the other fields are handled for consistency.
* Allow downloading more than one cover.
    The previous behavior is kept as-is, but setting the "covers" option to "all" now grabs all available covers.
* Add support for downloading subtitles
    Allows filtering subtitles by source type (ASR, MT) and language.
* Ensure archive uniqueness for covers and subtitles.
* Update the URL test pattern to include the `image` extension.
    Although Tiktok may serve the covers with jpeg content, the file ending can be `.image`.
    The test before 0c14b164 failed because the asserted URL did not match all cover types, but the now used pattern needs the mentioned file ending.
* Add support for "creator_caption" subtitles in "LC" format.
    These subtitles have the keys "Format" set to "creator_caption" and "Source" to "LC".
* Add "LC" (Local Captions) as a subtitle source type in the documentation
* Code deduplication and renaming subtitle metadata
    Changed the item type from singular `subtitle` to `subtitles`.
    Removed the wrong descriptor `cover` from the subtitles fallback title.
* Refactor subtitle filtering
    The filter is now prepared in `_init` to prevent parsing the same config parameter for every item.
    The `_extract_subtitles` function will still extract if either filter (source or language) matches.
* Generate a `file_id` for subtitles
    Subtitles have multiple fields that determine the unique file, so these are simply concatenated.
    This is similar to the cover types, only with more variations.
* Added tests for subtitles
* fix docs entries
* fix '"covers": "all"'
* simplify some code
* Fix fallback title for subtitles
    Added the missing "f" to the f-string and added "subtitle" to the title.
    The resulting title will look like "TikTok video subtitle #1234567"
This commit is contained in:
bassberry
2026-01-30 21:01:06 +01:00
committed by GitHub
parent 2d01fef300
commit fd5f5611f6
4 changed files with 225 additions and 33 deletions

View File

@@ -5914,12 +5914,25 @@ Description
extractor.tiktok.covers
-----------------------
Type
``bool``
* ``bool``
* ``string``
Default
``false``
Description
Download video covers.
``true``
Download the first cover found in the following order:
* ``thumbnail``
* ``cover``
* ``originCover``
* ``dynamicCover``
``false``
Do not download covers
``"all"``
Download all available covers
extractor.tiktok.photos
-----------------------
@@ -5931,6 +5944,47 @@ Description
Download photos.
extractor.tiktok.subtitles
--------------------------
Type
* ``bool``
* ``string``
Default
``false``
Example
* ``"all"``
* ``"ASR,MT,LC"``
* ``"ASR,eng-US"``
Description
Download video subtitles.
The subtitles can be filtered by source or language.
The following source types can be filtered:
* ``ASR`` - Automatic Speech Recognition
* ``MT`` - Machine Translation
* ``LC`` - Local Captions / Creator Captions
If both source types and language codes are provided,
only subtitles matching both are downloaded.
``true``
Download all subtitles tagged ``ASR``
``false``
Do not download subtitles
``"all"``
Download all available subtitles.
``"ASR,MT,eng-US,cmn-Hans-CN"``
Download english and simplified chinese subtitles
that are either automatically recognized or machine translated.
The source types and languages can be listed in any order.
Note
It is not possible to filter all subtitles of a specific source type,
while also filtering for additional languages of another source type.
(e.g. any ASR subtitle + fra-FR of any source type)
For this, refer to `extractor.*.image-filter`_.
extractor.tiktok.videos
-----------------------
Type

View File

@@ -825,10 +825,11 @@
},
"tiktok":
{
"audio" : true,
"covers": false,
"photos": true,
"videos": true,
"audio" : true,
"covers" : false,
"photos" : true,
"subtitles": false,
"videos" : true,
"tiktok-range": "",
"posts": {