[tiktok] remove yt-dlp dependency & add support for more post types (#8715)

#7246 #8035 #8466 #8730

* [tiktok] support extracting videos directly without yt-dlp
* [tiktok] support extracting users directly without yt-dlp
* [tiktok] fixing logic, tests, linting errors
* [tiktok] implement tiktok-range support for non-yt-dlp user extractor
* [tiktok] Skip range filter if no ranges are given
* [tiktok] Remove debug code
* [tiktok] only check for faulty device IDs during the first couple of passes
    I think the original yt-dlp solution assumes that if a device ID works once, it will always work.
    Plus, my approach would cause needless retries in certain cases if hasMorePrevious does end up being wrong like the original algorithm accounts for. So let's copy the original algorithm here, too.
* [tiktok] support stories
* [tiktok] you can now extract audio without extracting photos
* [tiktok] add TiktokFollowingExtractor
* [tiktok] update supportedsites to include stories
* [tiktok] Keep tiktok-range option for no content user account test
    It acts as a nice guard against that account suddenly having lots of posts to extract
* [tiktok] TiktokUserExtractor and TiktokFollowingExtractor rewrite
* [tiktok] Fix avatar naming convention to match that of posts
* [tiktok] remove type hints for compatibility with older Python versions
* [tiktok] Improve performance of TiktokFollowingExtractor
    This was largely achieved using the story/batch/item_list endpoint
* [tiktok] Forgot to run flake8
* [tiktok] remove old constant
* [tiktok] Support order-posts config item
* [tiktok] flake8
* [tiktok] Older Python versions don't support match
* [tiktok] always ask for posts in chronological order when in "desc" mode
    We should aim to avoid having pinned posts returned before non-pinned ones
* [tiktok] Add liked posts extraction
* [tiktok] Add reposts extraction
* [tiktok] Add saved posts extraction

* cleanup imports
* remove '# MARK:' comments
* remove & simplify 'except' statements
    KeyboardInterrupt & SystemExit inherit from BaseException (not Exception)
    and therefore don't need special handling
* split 'user' extractor
* move PATTERNs into their respective functions
* use dict comprehensions
* add only-matching test URLs for split user extractors
* update config docs
    rename 'tiktok-user-extractor' to 'ytdl'
* document '"popular"' 'order-posts' value
* inline and remove 'util.chunk()'
This commit is contained in:
CasualYouTuber31
2025-12-30 16:17:57 +00:00
committed by GitHub
parent c8c4575c7f
commit a6c845bdc8
6 changed files with 1299 additions and 112 deletions

View File

@@ -5845,6 +5845,16 @@ Description
Download video covers.
extractor.tiktok.photos
-----------------------
Type
``bool``
Default
``true``
Description
Download photos.
extractor.tiktok.videos
-----------------------
Type
@@ -5855,18 +5865,52 @@ Description
Download videos using |ytdl|.
extractor.tiktok.user.avatar
----------------------------
extractor.tiktok.tiktok-range
-----------------------------
Type
``string``
Default
``""``
Example
``"1-20"``
Description
Range or playlist indices of ``tiktok`` posts to extract.
When using `ytdl`, see
`ytdl/playlist_items <https://github.com/yt-dlp/yt-dlp/blob/3042afb5fe342d3a00de76704cd7de611acc350e/yt_dlp/YoutubeDL.py#L289>`__
for details.
extractor.tiktok.posts.order-posts
----------------------------------
Type
``string``
Default
``"desc"``
Description
Controls the order in which
posts are processed.
``"asc"`` | ``"reverse"``
Ascending order (oldest first)
``"desc"``
Descending order (newest first)
``"popular"``
*Popular* order
extractor.tiktok.posts.ytdl
---------------------------
Type
``bool``
Default
``true``
``false``
Description
Download user avatars.
Extract user posts with |ytdl|
extractor.tiktok.user.module
----------------------------
extractor.tiktok.posts.module
-----------------------------
Type
|Module|_
Default
@@ -5878,20 +5922,25 @@ Description
See `extractor.ytdl.module`_.
extractor.tiktok.user.tiktok-range
----------------------------------
extractor.tiktok.user.include
-----------------------------
Type
``string``
* ``string``
* ``list`` of ``strings``
Default
``""``
Example
``"1-20"``
``["avatar", "posts"]``
Description
Range or playlist indices of ``tiktok`` user posts to extract.
See
`ytdl/playlist_items <https://github.com/yt-dlp/yt-dlp/blob/3042afb5fe342d3a00de76704cd7de611acc350e/yt_dlp/YoutubeDL.py#L289>`__
for details.
A (comma-separated) list of subcategories to include
when processing a user profile.
Supported Values
* ``avatar``
* ``posts``
* ``reposts``
* ``stories``
* ``likes``
* ``saved``
Note
It is possible to use ``"all"`` instead of listing all values separately.
extractor.tumblr.avatar

View File

@@ -819,13 +819,18 @@
"tiktok":
{
"audio" : true,
"videos": true,
"covers": false,
"photos": true,
"videos": true,
"tiktok-range": "",
"posts": {
"order-posts": "desc",
"ytdl" : false,
"module": null
},
"user": {
"avatar": true,
"module": null,
"tiktok-range": ""
"include": ["avatar", "posts"]
}
},
"tsumino":

View File

@@ -1090,7 +1090,7 @@ Consider all listed sites to potentially be NSFW.
<tr id="tiktok" title="tiktok">
<td>TikTok</td>
<td>https://www.tiktok.com/</td>
<td>Posts, User Profiles, VM Posts</td>
<td>Avatars, Followed Users (Stories Only), Likes, Posts, User Posts, Reposts, Saved Posts, Stories, User Profiles, VM Posts</td>
<td><a href="https://github.com/mikf/gallery-dl#cookies">Cookies</a></td>
</tr>
<tr id="tmohentai" title="tmohentai">

File diff suppressed because it is too large Load Diff

View File

@@ -234,6 +234,7 @@ SUBCATEGORY_MAP = {
"media" : "Media Files",
"popular": "Popular Images",
"recent" : "Recent Images",
"saved" : "Saved Posts",
"search" : "Search Results",
"status" : "Images from Statuses",
"tag" : "Tag Searches",
@@ -333,7 +334,6 @@ SUBCATEGORY_MAP = {
},
"instagram": {
"posts": "",
"saved": "Saved Posts",
"tagged": "Tagged Posts",
"stories-tray": "Stories Home Tray",
},
@@ -422,7 +422,9 @@ SUBCATEGORY_MAP = {
"asset": "Individual Assets",
},
"tiktok": {
"posts": "User Posts",
"vmpost": "VM Posts",
"following": "Followed Users (Stories Only)",
},
"tumblr": {
"day": "Days",

View File

@@ -8,6 +8,9 @@ from gallery_dl.extractor import tiktok
PATTERN = r"https://p1[69]-[^/?#.]+\.tiktokcdn[^/?#.]*\.com/[^/?#]+/\w+~.*\.jpe?g"
PATTERN_WITH_AUDIO = r"(?:" + PATTERN + r"|https://v\d+m?\.tiktokcdn[^/?#.]*\.com/[^?#]+\?[^/?#]+)"
VIDEO_PATTERN = r"https://v1[69]-webapp-prime.tiktok.com/video/tos/[^?#]+\?[^/?#]+"
OLD_VIDEO_PATTERN = r"https://www.tiktok.com/aweme/v1/play/\?[^/?#]+"
COMBINED_VIDEO_PATTERN = r"(?:" + VIDEO_PATTERN + r")|(?:" + OLD_VIDEO_PATTERN + r")"
USER_PATTERN = r"(https://www.tiktok.com/@([\w_.-]+)/video/(\d+)|" + PATTERN + r")"
@@ -40,7 +43,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktok.com/@d4vinefem/photo/7449575367024626974",
"#url" : "https://www.tiktok.com/@hullcity/photo/7557376330036153622",
"#comment" : "/photo/ link: single photo",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -49,7 +52,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktok.com/@d4vinefem/video/7449575367024626974",
"#url" : "https://www.tiktok.com/@hullcity/video/7557376330036153622",
"#comment" : "/video/ link: single photo",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -58,7 +61,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktokv.com/share/video/7449575367024626974",
"#url" : "https://www.tiktokv.com/share/video/7557376330036153622",
"#comment" : "www.tiktokv.com link: single photo",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -67,7 +70,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktok.com/@.mcfc.central/photo/7449701420934122785",
"#url" : "https://www.tiktok.com/@hullcity/photo/7553302113757990166",
"#comment" : "/photo/ link: few photos",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -76,7 +79,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktok.com/@.mcfc.central/video/7449701420934122785",
"#url" : "https://www.tiktok.com/@hullcity/video/7553302113757990166",
"#comment" : "/video/ link: few photos",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -85,7 +88,7 @@ __tests__ = (
},
{
"#url" : "https://www.tiktokv.com/share/video/7449701420934122785",
"#url" : "https://www.tiktokv.com/share/video/7553302113757990166",
"#comment" : "www.tiktokv.com link: few photos",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
@@ -94,12 +97,12 @@ __tests__ = (
},
{
"#url" : "https://www.tiktok.com/@ughuwhguweghw/video/1",
"#comment" : "deleted post",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#options" : {"videos": False, "audio": False},
"count" : 0,
"#url" : "https://www.tiktok.com/@ughuwhguweghw/video/1",
"#comment" : "deleted post",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#options" : {"videos": False, "audio": False},
"#count" : 0,
},
{
@@ -107,10 +110,19 @@ __tests__ = (
"#comment" : "Video post",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#results" : "ytdl:https://www.tiktok.com/@memezar/video/7449708266168274208",
"#pattern" : COMBINED_VIDEO_PATTERN,
"#options" : {"videos": True, "audio": True},
},
{
"#url" : "https://www.tiktok.com/@memezar/video/7449708266168274208",
"#comment" : "Video post (via yt-dlp)",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#results" : "ytdl:https://www.tiktok.com/@memezar/video/7449708266168274208",
"#options" : {"videos": "ytdl", "audio": True},
},
{
"#url" : "https://www.tiktok.com/@memezar/video/7449708266168274208",
"#comment" : "video post cover image",
@@ -126,7 +138,7 @@ __tests__ = (
"#comment" : "Video post as a /photo/ link",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#results" : "ytdl:https://www.tiktok.com/@memezar/video/7449708266168274208",
"#pattern" : COMBINED_VIDEO_PATTERN,
"#options" : {"videos": True, "audio": True},
},
@@ -155,7 +167,7 @@ __tests__ = (
"#comment" : "Video post as a share link",
"#category" : ("", "tiktok", "post"),
"#class" : tiktok.TiktokPostExtractor,
"#results" : "ytdl:https://www.tiktok.com/@/video/7449708266168274208",
"#pattern" : COMBINED_VIDEO_PATTERN,
"#options" : {"videos": True},
},
@@ -196,6 +208,7 @@ __tests__ = (
"#comment" : "no 'author' (#8189)",
"#class" : tiktok.TiktokPostExtractor,
"#results" : "ytdl:https://www.tiktok.com/@veronicaperasso_1/video/7212008840433274118",
"#options" : {"videos": "ytdl"},
},
{
@@ -260,9 +273,50 @@ __tests__ = (
"#category" : ("", "tiktok", "user"),
"#class" : tiktok.TiktokUserExtractor,
"#pattern" : USER_PATTERN,
"#count" : 11, # 10 posts + 1 avatar
"#options" : {"videos": True, "audio": True, "tiktok-range": "1-10"},
},
# order-posts currently has no effect if logged-in cookies aren't used.
# {
# "#url" : "https://www.tiktok.com/@chillezy",
# "#comment" : "User profile ascending order",
# "#category" : ("", "tiktok", "user"),
# "#class" : tiktok.TiktokUserExtractor,
# "#results" : "https://www.tiktok.com/@chillezy/video/7112145009356344622",
# "#options" : {"videos": True, "audio": True, "avatar": False, "tiktok-range": "1", "order-posts": "asc"},
# },
# {
# "#url" : "https://www.tiktok.com/@chillezy",
# "#comment" : "User profile popular order",
# "#category" : ("", "tiktok", "user"),
# "#class" : tiktok.TiktokUserExtractor,
# "#results" : "https://www.tiktok.com/@chillezy/video/7240568259186019630",
# "#options" : {"videos": True, "audio": True, "avatar": False, "tiktok-range": "1", "order-posts": "popular"},
# },
{
"#url" : "https://www.tiktok.com/@chillezy",
"#comment" : "User profile via yt-dlp",
"#category" : ("", "tiktok", "user"),
"#class" : tiktok.TiktokUserExtractor,
"#pattern" : USER_PATTERN,
"#count" : 11, # 10 posts + 1 avatar
"#options" : {"videos": True, "audio": True, "tiktok-range": "1-10", "tiktok-user-extractor": "ytdl"},
},
{
"#url" : "https://www.tiktok.com/@chillezy",
"#comment" : "User profile without avatar",
"#category" : ("", "tiktok", "user"),
"#class" : tiktok.TiktokUserExtractor,
"#pattern" : USER_PATTERN,
"#count" : 10, # 10 posts
"#options" : {"videos": True, "audio": True, "avatar": False, "tiktok-range": "1-10"},
},
{
"#url" : "https://www.tiktok.com/@joeysc14/",
"#comment" : "Public user profile with no content",
@@ -270,7 +324,37 @@ __tests__ = (
"#class" : tiktok.TiktokUserExtractor,
"#pattern" : PATTERN,
"#options" : {"videos": False, "tiktok-range": "1"},
"#count" : 1,
"#count" : 1, # 1 avatar
},
{
"#url" : "https://www.tiktok.com/@chillezy/avatar",
"#class" : tiktok.TiktokAvatarExtractor,
},
{
"#url" : "https://www.tiktok.com/@chillezy/posts",
"#class" : tiktok.TiktokPostsExtractor,
},
{
"#url" : "https://www.tiktok.com/@chillezy/reposts",
"#class" : tiktok.TiktokRepostsExtractor,
},
{
"#url" : "https://www.tiktok.com/@chillezy/stories",
"#class" : tiktok.TiktokStoriesExtractor,
},
{
"#url" : "https://www.tiktok.com/@chillezy/likes",
"#class" : tiktok.TiktokLikesExtractor,
},
{
"#url" : "https://www.tiktok.com/@chillezy/saved",
"#class" : tiktok.TiktokSavedExtractor,
},
)