[tiktok] extract subtitles and all cover types (#8805)
* Make sure that `img_id`, `audio_id` and `cover_id` fields are always available.
The values are set '' where they are not applicable.
Having `img_id` is necessary for the default `archive_fmt`, the other fields are handled for consistency.
* Allow downloading more than one cover.
The previous behavior is kept as-is, but setting the "covers" option to "all" now grabs all available covers.
* Add support for downloading subtitles
Allows filtering subtitles by source type (ASR, MT) and language.
* Ensure archive uniqueness for covers and subtitles.
* Update the URL test pattern to include the `image` extension.
Although Tiktok may serve the covers with jpeg content, the file ending can be `.image`.
The test before 0c14b164 failed because the asserted URL did not match all cover types, but the now used pattern needs the mentioned file ending.
* Add support for "creator_caption" subtitles in "LC" format.
These subtitles have the keys "Format" set to "creator_caption" and "Source" to "LC".
* Add "LC" (Local Captions) as a subtitle source type in the documentation
* Code deduplication and renaming subtitle metadata
Changed the item type from singular `subtitle` to `subtitles`.
Removed the wrong descriptor `cover` from the subtitles fallback title.
* Refactor subtitle filtering
The filter is now prepared in `_init` to prevent parsing the same config parameter for every item.
The `_extract_subtitles` function will still extract if either filter (source or language) matches.
* Generate a `file_id` for subtitles
Subtitles have multiple fields that determine the unique file, so these are simply concatenated.
This is similar to the cover types, only with more variations.
* Added tests for subtitles
* fix docs entries
* fix '"covers": "all"'
* simplify some code
* Fix fallback title for subtitles
Added the missing "f" to the f-string and added "subtitle" to the title.
The resulting title will look like "TikTok video subtitle #1234567"
This commit is contained in:
@@ -6,12 +6,13 @@
|
||||
|
||||
from gallery_dl.extractor import tiktok
|
||||
|
||||
PATTERN = r"https://p1[69]-[^/?#.]+\.tiktokcdn[^/?#.]*\.com/[^/?#]+/\w+~.*\.jpe?g"
|
||||
PATTERN = r"https://p1[69]-[^/?#.]+\.tiktokcdn[^/?#.]*\.com/[^/?#]+/\w+~.*\.(jpe?g|image)"
|
||||
PATTERN_WITH_AUDIO = r"(?:" + PATTERN + r"|https://v\d+m?\.tiktokcdn[^/?#.]*\.com/[^?#]+\?[^/?#]+)"
|
||||
VIDEO_PATTERN = r"https://v1[69]-webapp-prime.tiktok.com/video/tos/[^?#]+\?[^/?#]+"
|
||||
OLD_VIDEO_PATTERN = r"https://www.tiktok.com/aweme/v1/play/\?[^/?#]+"
|
||||
COMBINED_VIDEO_PATTERN = r"(?:" + VIDEO_PATTERN + r")|(?:" + OLD_VIDEO_PATTERN + r")"
|
||||
USER_PATTERN = r"(https://www.tiktok.com/@([\w_.-]+)/video/(\d+)|" + PATTERN + r")"
|
||||
SUBTITLE_PATTERN = r"https://v1[69]-[^/?#.]+\.tiktokcdn[^/?#.]*\.com/[^/?#]+/.*"
|
||||
|
||||
|
||||
__tests__ = (
|
||||
@@ -127,10 +128,22 @@ __tests__ = (
|
||||
"#url" : "https://www.tiktok.com/@memezar/video/7449708266168274208",
|
||||
"#comment" : "video post cover image",
|
||||
"#class" : tiktok.TiktokPostExtractor,
|
||||
"#pattern" : r"https://p19-common-sign-useastred.tiktokcdn-eu.com/tos-useast2a-p-0037-euttp/o4rVzhI1bABhooAaEqtCAYGi6nijIsDib8NGfC~tplv-tiktokx-origin.image\?dr=10395&x-expires=\d+&x-signature=.+",
|
||||
"#pattern" : PATTERN,
|
||||
"#count" : 1,
|
||||
"#options" : {"videos": False, "covers": True},
|
||||
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.tiktok.com/@memezar/video/7449708266168274208",
|
||||
"#comment" : "all video post cover images",
|
||||
"#class" : tiktok.TiktokPostExtractor,
|
||||
"#pattern" : PATTERN,
|
||||
"#count" : 3,
|
||||
"#options" : {"videos": False, "covers": "all"},
|
||||
|
||||
|
||||
},
|
||||
|
||||
{
|
||||
@@ -211,6 +224,44 @@ __tests__ = (
|
||||
"#options" : {"videos": "ytdl"},
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.tiktok.com/@memezar/video/7588916452304997635",
|
||||
"#comment" : "default subtitles",
|
||||
"#class" : tiktok.TiktokPostExtractor,
|
||||
"#pattern" : SUBTITLE_PATTERN,
|
||||
"#count" : 1,
|
||||
"#options" : {"videos": False, "covers": False, "subtitles": True}
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://www.tiktok.com/@memezar/video/7588916452304997635",
|
||||
"#comment" : "english subtitles",
|
||||
"#class" : tiktok.TiktokPostExtractor,
|
||||
"#pattern" : SUBTITLE_PATTERN,
|
||||
"#count" : 1,
|
||||
"#options" : {"videos": False, "covers": False, "subtitles": "eng-US"}
|
||||
},
|
||||
|
||||
# This test is prone to break when more translation agents are added!
|
||||
{
|
||||
"#url" : "https://www.tiktok.com/@memezar/video/7588916452304997635",
|
||||
"#comment" : "combined subtitle filter",
|
||||
"#class" : tiktok.TiktokPostExtractor,
|
||||
"#pattern" : SUBTITLE_PATTERN,
|
||||
"#count" : 6,
|
||||
"#options" : {"videos": False, "covers": False, "subtitles": "ASR,deu-DE"}
|
||||
},
|
||||
|
||||
# This test is prone to break when new languages or more translation agents are added!
|
||||
{
|
||||
"#url" : "https://www.tiktok.com/@memezar/video/7588916452304997635",
|
||||
"#comment" : "all subtitles",
|
||||
"#class" : tiktok.TiktokPostExtractor,
|
||||
"#pattern" : SUBTITLE_PATTERN,
|
||||
"#count" : 64,
|
||||
"#options" : {"videos": False, "covers": False, "subtitles": "all"}
|
||||
},
|
||||
|
||||
{
|
||||
"#url" : "https://vm.tiktok.com/ZGdh4WUhr/",
|
||||
"#comment" : "vm.tiktok.com link: many photos",
|
||||
|
||||
Reference in New Issue
Block a user