[tiktok] extract subtitles and all cover types (#8805)

* Make sure that `img_id`, `audio_id` and `cover_id` fields are always available.
    The values are set '' where they are not applicable.
    Having `img_id` is necessary for the default `archive_fmt`, the other fields are handled for consistency.
* Allow downloading more than one cover.
    The previous behavior is kept as-is, but setting the "covers" option to "all" now grabs all available covers.
* Add support for downloading subtitles
    Allows filtering subtitles by source type (ASR, MT) and language.
* Ensure archive uniqueness for covers and subtitles.
* Update the URL test pattern to include the `image` extension.
    Although Tiktok may serve the covers with jpeg content, the file ending can be `.image`.
    The test before 0c14b164 failed because the asserted URL did not match all cover types, but the now used pattern needs the mentioned file ending.
* Add support for "creator_caption" subtitles in "LC" format.
    These subtitles have the keys "Format" set to "creator_caption" and "Source" to "LC".
* Add "LC" (Local Captions) as a subtitle source type in the documentation
* Code deduplication and renaming subtitle metadata
    Changed the item type from singular `subtitle` to `subtitles`.
    Removed the wrong descriptor `cover` from the subtitles fallback title.
* Refactor subtitle filtering
    The filter is now prepared in `_init` to prevent parsing the same config parameter for every item.
    The `_extract_subtitles` function will still extract if either filter (source or language) matches.
* Generate a `file_id` for subtitles
    Subtitles have multiple fields that determine the unique file, so these are simply concatenated.
    This is similar to the cover types, only with more variations.
* Added tests for subtitles
* fix docs entries
* fix '"covers": "all"'
* simplify some code
* Fix fallback title for subtitles
    Added the missing "f" to the f-string and added "subtitle" to the title.
    The resulting title will look like "TikTok video subtitle #1234567"
This commit is contained in:
bassberry
2026-01-30 21:01:06 +01:00
committed by GitHub
parent 2d01fef300
commit fd5f5611f6
4 changed files with 225 additions and 33 deletions

View File

@@ -6,12 +6,13 @@
from gallery_dl.extractor import tiktok
PATTERN = r"https://p1[69]-[^/?#.]+\.tiktokcdn[^/?#.]*\.com/[^/?#]+/\w+~.*\.jpe?g"
PATTERN = r"https://p1[69]-[^/?#.]+\.tiktokcdn[^/?#.]*\.com/[^/?#]+/\w+~.*\.(jpe?g|image)"
PATTERN_WITH_AUDIO = r"(?:" + PATTERN + r"|https://v\d+m?\.tiktokcdn[^/?#.]*\.com/[^?#]+\?[^/?#]+)"
VIDEO_PATTERN = r"https://v1[69]-webapp-prime.tiktok.com/video/tos/[^?#]+\?[^/?#]+"
OLD_VIDEO_PATTERN = r"https://www.tiktok.com/aweme/v1/play/\?[^/?#]+"
COMBINED_VIDEO_PATTERN = r"(?:" + VIDEO_PATTERN + r")|(?:" + OLD_VIDEO_PATTERN + r")"
USER_PATTERN = r"(https://www.tiktok.com/@([\w_.-]+)/video/(\d+)|" + PATTERN + r")"
SUBTITLE_PATTERN = r"https://v1[69]-[^/?#.]+\.tiktokcdn[^/?#.]*\.com/[^/?#]+/.*"
__tests__ = (
@@ -127,10 +128,22 @@ __tests__ = (
"#url" : "https://www.tiktok.com/@memezar/video/7449708266168274208",
"#comment" : "video post cover image",
"#class" : tiktok.TiktokPostExtractor,
"#pattern" : r"https://p19-common-sign-useastred.tiktokcdn-eu.com/tos-useast2a-p-0037-euttp/o4rVzhI1bABhooAaEqtCAYGi6nijIsDib8NGfC~tplv-tiktokx-origin.image\?dr=10395&x-expires=\d+&x-signature=.+",
"#pattern" : PATTERN,
"#count" : 1,
"#options" : {"videos": False, "covers": True},
},
{
"#url" : "https://www.tiktok.com/@memezar/video/7449708266168274208",
"#comment" : "all video post cover images",
"#class" : tiktok.TiktokPostExtractor,
"#pattern" : PATTERN,
"#count" : 3,
"#options" : {"videos": False, "covers": "all"},
},
{
@@ -211,6 +224,44 @@ __tests__ = (
"#options" : {"videos": "ytdl"},
},
{
"#url" : "https://www.tiktok.com/@memezar/video/7588916452304997635",
"#comment" : "default subtitles",
"#class" : tiktok.TiktokPostExtractor,
"#pattern" : SUBTITLE_PATTERN,
"#count" : 1,
"#options" : {"videos": False, "covers": False, "subtitles": True}
},
{
"#url" : "https://www.tiktok.com/@memezar/video/7588916452304997635",
"#comment" : "english subtitles",
"#class" : tiktok.TiktokPostExtractor,
"#pattern" : SUBTITLE_PATTERN,
"#count" : 1,
"#options" : {"videos": False, "covers": False, "subtitles": "eng-US"}
},
# This test is prone to break when more translation agents are added!
{
"#url" : "https://www.tiktok.com/@memezar/video/7588916452304997635",
"#comment" : "combined subtitle filter",
"#class" : tiktok.TiktokPostExtractor,
"#pattern" : SUBTITLE_PATTERN,
"#count" : 6,
"#options" : {"videos": False, "covers": False, "subtitles": "ASR,deu-DE"}
},
# This test is prone to break when new languages or more translation agents are added!
{
"#url" : "https://www.tiktok.com/@memezar/video/7588916452304997635",
"#comment" : "all subtitles",
"#class" : tiktok.TiktokPostExtractor,
"#pattern" : SUBTITLE_PATTERN,
"#count" : 64,
"#options" : {"videos": False, "covers": False, "subtitles": "all"}
},
{
"#url" : "https://vm.tiktok.com/ZGdh4WUhr/",
"#comment" : "vm.tiktok.com link: many photos",