[tiktok] extract subtitles and all cover types (#8805)
* Make sure that `img_id`, `audio_id` and `cover_id` fields are always available.
The values are set '' where they are not applicable.
Having `img_id` is necessary for the default `archive_fmt`, the other fields are handled for consistency.
* Allow downloading more than one cover.
The previous behavior is kept as-is, but setting the "covers" option to "all" now grabs all available covers.
* Add support for downloading subtitles
Allows filtering subtitles by source type (ASR, MT) and language.
* Ensure archive uniqueness for covers and subtitles.
* Update the URL test pattern to include the `image` extension.
Although Tiktok may serve the covers with jpeg content, the file ending can be `.image`.
The test before 0c14b164 failed because the asserted URL did not match all cover types, but the now used pattern needs the mentioned file ending.
* Add support for "creator_caption" subtitles in "LC" format.
These subtitles have the keys "Format" set to "creator_caption" and "Source" to "LC".
* Add "LC" (Local Captions) as a subtitle source type in the documentation
* Code deduplication and renaming subtitle metadata
Changed the item type from singular `subtitle` to `subtitles`.
Removed the wrong descriptor `cover` from the subtitles fallback title.
* Refactor subtitle filtering
The filter is now prepared in `_init` to prevent parsing the same config parameter for every item.
The `_extract_subtitles` function will still extract if either filter (source or language) matches.
* Generate a `file_id` for subtitles
Subtitles have multiple fields that determine the unique file, so these are simply concatenated.
This is similar to the cover types, only with more variations.
* Added tests for subtitles
* fix docs entries
* fix '"covers": "all"'
* simplify some code
* Fix fallback title for subtitles
Added the missing "f" to the f-string and added "subtitle" to the title.
The resulting title will look like "TikTok video subtitle #1234567"
This commit is contained in:
@@ -5914,12 +5914,25 @@ Description
|
||||
extractor.tiktok.covers
|
||||
-----------------------
|
||||
Type
|
||||
``bool``
|
||||
* ``bool``
|
||||
* ``string``
|
||||
Default
|
||||
``false``
|
||||
Description
|
||||
Download video covers.
|
||||
|
||||
``true``
|
||||
Download the first cover found in the following order:
|
||||
|
||||
* ``thumbnail``
|
||||
* ``cover``
|
||||
* ``originCover``
|
||||
* ``dynamicCover``
|
||||
``false``
|
||||
Do not download covers
|
||||
``"all"``
|
||||
Download all available covers
|
||||
|
||||
|
||||
extractor.tiktok.photos
|
||||
-----------------------
|
||||
@@ -5931,6 +5944,47 @@ Description
|
||||
Download photos.
|
||||
|
||||
|
||||
extractor.tiktok.subtitles
|
||||
--------------------------
|
||||
Type
|
||||
* ``bool``
|
||||
* ``string``
|
||||
Default
|
||||
``false``
|
||||
Example
|
||||
* ``"all"``
|
||||
* ``"ASR,MT,LC"``
|
||||
* ``"ASR,eng-US"``
|
||||
Description
|
||||
Download video subtitles.
|
||||
The subtitles can be filtered by source or language.
|
||||
The following source types can be filtered:
|
||||
|
||||
* ``ASR`` - Automatic Speech Recognition
|
||||
* ``MT`` - Machine Translation
|
||||
* ``LC`` - Local Captions / Creator Captions
|
||||
|
||||
If both source types and language codes are provided,
|
||||
only subtitles matching both are downloaded.
|
||||
|
||||
``true``
|
||||
Download all subtitles tagged ``ASR``
|
||||
``false``
|
||||
Do not download subtitles
|
||||
``"all"``
|
||||
Download all available subtitles.
|
||||
``"ASR,MT,eng-US,cmn-Hans-CN"``
|
||||
Download english and simplified chinese subtitles
|
||||
that are either automatically recognized or machine translated.
|
||||
|
||||
The source types and languages can be listed in any order.
|
||||
Note
|
||||
It is not possible to filter all subtitles of a specific source type,
|
||||
while also filtering for additional languages of another source type.
|
||||
(e.g. any ASR subtitle + fra-FR of any source type)
|
||||
For this, refer to `extractor.*.image-filter`_.
|
||||
|
||||
|
||||
extractor.tiktok.videos
|
||||
-----------------------
|
||||
Type
|
||||
|
||||
@@ -825,10 +825,11 @@
|
||||
},
|
||||
"tiktok":
|
||||
{
|
||||
"audio" : true,
|
||||
"covers": false,
|
||||
"photos": true,
|
||||
"videos": true,
|
||||
"audio" : true,
|
||||
"covers" : false,
|
||||
"photos" : true,
|
||||
"subtitles": false,
|
||||
"videos" : true,
|
||||
"tiktok-range": "",
|
||||
|
||||
"posts": {
|
||||
|
||||
Reference in New Issue
Block a user