[tiktok] remove yt-dlp dependency & add support for more post types (#8715)

#7246 #8035 #8466 #8730

* [tiktok] support extracting videos directly without yt-dlp
* [tiktok] support extracting users directly without yt-dlp
* [tiktok] fixing logic, tests, linting errors
* [tiktok] implement tiktok-range support for non-yt-dlp user extractor
* [tiktok] Skip range filter if no ranges are given
* [tiktok] Remove debug code
* [tiktok] only check for faulty device IDs during the first couple of passes
    I think the original yt-dlp solution assumes that if a device ID works once, it will always work.
    Plus, my approach would cause needless retries in certain cases if hasMorePrevious does end up being wrong like the original algorithm accounts for. So let's copy the original algorithm here, too.
* [tiktok] support stories
* [tiktok] you can now extract audio without extracting photos
* [tiktok] add TiktokFollowingExtractor
* [tiktok] update supportedsites to include stories
* [tiktok] Keep tiktok-range option for no content user account test
    It acts as a nice guard against that account suddenly having lots of posts to extract
* [tiktok] TiktokUserExtractor and TiktokFollowingExtractor rewrite
* [tiktok] Fix avatar naming convention to match that of posts
* [tiktok] remove type hints for compatibility with older Python versions
* [tiktok] Improve performance of TiktokFollowingExtractor
    This was largely achieved using the story/batch/item_list endpoint
* [tiktok] Forgot to run flake8
* [tiktok] remove old constant
* [tiktok] Support order-posts config item
* [tiktok] flake8
* [tiktok] Older Python versions don't support match
* [tiktok] always ask for posts in chronological order when in "desc" mode
    We should aim to avoid having pinned posts returned before non-pinned ones
* [tiktok] Add liked posts extraction
* [tiktok] Add reposts extraction
* [tiktok] Add saved posts extraction

* cleanup imports
* remove '# MARK:' comments
* remove & simplify 'except' statements
    KeyboardInterrupt & SystemExit inherit from BaseException (not Exception)
    and therefore don't need special handling
* split 'user' extractor
* move PATTERNs into their respective functions
* use dict comprehensions
* add only-matching test URLs for split user extractors
* update config docs
    rename 'tiktok-user-extractor' to 'ytdl'
* document '"popular"' 'order-posts' value
* inline and remove 'util.chunk()'
This commit is contained in:
CasualYouTuber31
2025-12-30 16:17:57 +00:00
committed by GitHub
parent c8c4575c7f
commit a6c845bdc8
6 changed files with 1299 additions and 112 deletions

View File

@@ -8,6 +8,9 @@ from gallery_dl.extractor import tiktok
PATTERN = r"https://p1[69]-[^/?#.]+\.tiktokcdn[^/?#.]*\.com/[^/?#]+/\w+~.*\.jpe?g"
PATTERN_WITH_AUDIO = r"(?:" + PATTERN + r"|https://v\d+m?\.tiktokcdn[^/?#.]*\.com/[^?#]+\?[^/?#]+)"
VIDEO_PATTERN = r"https://v1[69]-webapp-prime.tiktok.com/video/tos/[^?#]+\?[^/?#]+"
OLD_VIDEO_PATTERN = r"https://www.tiktok.com/aweme/v1/play/\?[^/?#]+"
COMBINED_VIDEO_PATTERN = r"(?:" + VIDEO_PATTERN + r")|(?:" + OLD_VIDEO_PATTERN + r")"
USER_PATTERN = r"(https://www.tiktok.com/@([\w_.-]+)/video/(\d+)|" + PATTERN + r")"
@@ -40,7 +43,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktok.com/@d4vinefem/photo/7449575367024626974",
"#url" : "https://www.tiktok.com/@hullcity/photo/7557376330036153622",
"#comment" : "/photo/ link: single photo",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -49,7 +52,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktok.com/@d4vinefem/video/7449575367024626974",
"#url" : "https://www.tiktok.com/@hullcity/video/7557376330036153622",
"#comment" : "/video/ link: single photo",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -58,7 +61,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktokv.com/share/video/7449575367024626974",
"#url" : "https://www.tiktokv.com/share/video/7557376330036153622",
"#comment" : "www.tiktokv.com link: single photo",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -67,7 +70,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktok.com/@.mcfc.central/photo/7449701420934122785",
"#url" : "https://www.tiktok.com/@hullcity/photo/7553302113757990166",
"#comment" : "/photo/ link: few photos",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -76,7 +79,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktok.com/@.mcfc.central/video/7449701420934122785",
"#url" : "https://www.tiktok.com/@hullcity/video/7553302113757990166",
"#comment" : "/video/ link: few photos",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -85,7 +88,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktokv.com/share/video/7449701420934122785",
"#url" : "https://www.tiktokv.com/share/video/7553302113757990166",
"#comment" : "www.tiktokv.com link: few photos",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -94,12 +97,12 @@ __tests__ = (
},
{
"#url" : "https://www.tiktok.com/@ughuwhguweghw/video/1",
"#comment" : "deleted post",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#options" : {"videos": False, "audio": False},
"count" : 0,
"#url" : "https://www.tiktok.com/@ughuwhguweghw/video/1",
"#comment" : "deleted post",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#options" : {"videos": False, "audio": False},
"#count" : 0,
},
{
@@ -107,10 +110,19 @@ __tests__ = (
"#comment" : "Video post",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#results" : "ytdl:https://www.tiktok.com/@memezar/video/7449708266168274208",
"#pattern" : COMBINED_VIDEO_PATTERN,
"#options" : {"videos": True, "audio": True},
},
{
"#url" : "https://www.tiktok.com/@memezar/video/7449708266168274208",
"#comment" : "Video post (via yt-dlp)",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#results" : "ytdl:https://www.tiktok.com/@memezar/video/7449708266168274208",
"#options" : {"videos": "ytdl", "audio": True},
},
{
"#url" : "https://www.tiktok.com/@memezar/video/7449708266168274208",
"#comment" : "video post cover image",
@@ -126,7 +138,7 @@ __tests__ = (
"#comment" : "Video post as a /photo/ link",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#results" : "ytdl:https://www.tiktok.com/@memezar/video/7449708266168274208",
"#pattern" : COMBINED_VIDEO_PATTERN,
"#options" : {"videos": True, "audio": True},
},
@@ -155,7 +167,7 @@ __tests__ = (
"#comment" : "Video post as a share link",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#results" : "ytdl:https://www.tiktok.com/@/video/7449708266168274208",
"#pattern" : COMBINED_VIDEO_PATTERN,
"#options" : {"videos": True},
},
@@ -196,6 +208,7 @@ __tests__ = (
"#comment" : "no 'author' (#8189)",
"#class" : tiktok.TiktokPostExtractor,
"#results" : "ytdl:https://www.tiktok.com/@veronicaperasso_1/video/7212008840433274118",
"#options" : {"videos": "ytdl"},
},
{
@@ -260,9 +273,50 @@ __tests__ = (
"#category" : ("", "tiktok", "user"),
"#class" : tiktok.TiktokUserExtractor,
"#pattern" : USER_PATTERN,
"#count" : 11, # 10 posts + 1 avatar
"#options" : {"videos": True, "audio": True, "tiktok-range": "1-10"},
},
# order-posts currently has no effect if logged-in cookies aren't used.
# {
# "#url" : "https://www.tiktok.com/@chillezy",
# "#comment" : "User profile ascending order",
# "#category" : ("", "tiktok", "user"),
# "#class" : tiktok.TiktokUserExtractor,
# "#results" : "https://www.tiktok.com/@chillezy/video/7112145009356344622",
# "#options" : {"videos": True, "audio": True, "avatar": False, "tiktok-range": "1", "order-posts": "asc"},
# },
# {
# "#url" : "https://www.tiktok.com/@chillezy",
# "#comment" : "User profile popular order",
# "#category" : ("", "tiktok", "user"),
# "#class" : tiktok.TiktokUserExtractor,
# "#results" : "https://www.tiktok.com/@chillezy/video/7240568259186019630",
# "#options" : {"videos": True, "audio": True, "avatar": False, "tiktok-range": "1", "order-posts": "popular"},
# },
{
"#url" : "https://www.tiktok.com/@chillezy",
"#comment" : "User profile via yt-dlp",
"#category" : ("", "tiktok", "user"),
"#class" : tiktok.TiktokUserExtractor,
"#pattern" : USER_PATTERN,
"#count" : 11, # 10 posts + 1 avatar
"#options" : {"videos": True, "audio": True, "tiktok-range": "1-10", "tiktok-user-extractor": "ytdl"},
},
{
"#url" : "https://www.tiktok.com/@chillezy",
"#comment" : "User profile without avatar",
"#category" : ("", "tiktok", "user"),
"#class" : tiktok.TiktokUserExtractor,
"#pattern" : USER_PATTERN,
"#count" : 10, # 10 posts
"#options" : {"videos": True, "audio": True, "avatar": False, "tiktok-range": "1-10"},
},
{
"#url" : "https://www.tiktok.com/@joeysc14/",
"#comment" : "Public user profile with no content",
@@ -270,7 +324,37 @@ __tests__ = (
"#class" : tiktok.TiktokUserExtractor,
"#pattern" : PATTERN,
"#options" : {"videos": False, "tiktok-range": "1"},
"#count" : 1,
"#count" : 1, # 1 avatar
},
{
"#url" : "https://www.tiktok.com/@chillezy/avatar",
"#class" : tiktok.TiktokAvatarExtractor,
},
{
"#url" : "https://www.tiktok.com/@chillezy/posts",
"#class" : tiktok.TiktokPostsExtractor,
},
{
"#url" : "https://www.tiktok.com/@chillezy/reposts",
"#class" : tiktok.TiktokRepostsExtractor,
},
{
"#url" : "https://www.tiktok.com/@chillezy/stories",
"#class" : tiktok.TiktokStoriesExtractor,
},
{
"#url" : "https://www.tiktok.com/@chillezy/likes",
"#class" : tiktok.TiktokLikesExtractor,
},
{
"#url" : "https://www.tiktok.com/@chillezy/saved",
"#class" : tiktok.TiktokSavedExtractor,
},
)