Commit Graph

7780 Commits

Author SHA1 Message Date
Mike Fährmann
eed46f8dcf [build] update PyInstaller hiddenimports and py2exe modules 2026-02-01 19:29:30 +01:00
Mike Fährmann
73bf99612a [scrolller] move GraphQL queries 2026-02-01 19:18:52 +01:00
Mike Fährmann
cc645984a4 [luscious] export GraphQL queries 2026-02-01 19:18:14 +01:00
Mike Fährmann
0c24955507 [mangapark] export GraphQL queries 2026-02-01 19:18:10 +01:00
Mike Fährmann
40a4ff935a [500px] export GraphQL queries 2026-02-01 19:16:14 +01:00
Mike Fährmann
51d9fd2f4d [behance] export GraphQL queries 2026-02-01 19:13:38 +01:00
Mike Fährmann
1c2e2d5d08 [deviantart] export journal templates 2026-02-01 18:59:31 +01:00
Mike Fährmann
3d114dbc67 [deviantart] export 'tiptap' functions 2026-02-01 18:53:22 +01:00
Mike Fährmann
20ef39be45 [tsumino] export 'jsurl' code 2026-02-01 18:50:09 +01:00
Mike Fährmann
7692d31a57 [twitter] move transaction_id.py 2026-02-01 18:48:30 +01:00
Mike Fährmann
343981ac1c [common] add 'utils()' method 2026-02-01 18:48:17 +01:00
Mike Fährmann
fd6bc3961c release version 1.31.5 2026-01-31 10:49:00 +01:00
Mike Fährmann
1286839037 [socialmediagirlsforum] add tests 2026-01-31 09:55:45 +01:00
Mike Fährmann
5b8ad403dd [xenforo] decode '/goto/link-confirmation' links (#8964) 2026-01-31 09:55:45 +01:00
Mike Fährmann
4e6e2c27d5 [xenforo] support 'forums.socialmediagirls.com' (#8964) 2026-01-31 09:55:41 +01:00
CasualYouTuber31
01657caa15 [tiktok] do not fail entire extraction if one post fails (#8962) 2026-01-30 23:03:59 +01:00
bassberry
fd5f5611f6 [tiktok] extract subtitles and all cover types (#8805)
* Make sure that `img_id`, `audio_id` and `cover_id` fields are always available.
    The values are set '' where they are not applicable.
    Having `img_id` is necessary for the default `archive_fmt`, the other fields are handled for consistency.
* Allow downloading more than one cover.
    The previous behavior is kept as-is, but setting the "covers" option to "all" now grabs all available covers.
* Add support for downloading subtitles
    Allows filtering subtitles by source type (ASR, MT) and language.
* Ensure archive uniqueness for covers and subtitles.
* Update the URL test pattern to include the `image` extension.
    Although Tiktok may serve the covers with jpeg content, the file ending can be `.image`.
    The test before 0c14b164 failed because the asserted URL did not match all cover types, but the now used pattern needs the mentioned file ending.
* Add support for "creator_caption" subtitles in "LC" format.
    These subtitles have the keys "Format" set to "creator_caption" and "Source" to "LC".
* Add "LC" (Local Captions) as a subtitle source type in the documentation
* Code deduplication and renaming subtitle metadata
    Changed the item type from singular `subtitle` to `subtitles`.
    Removed the wrong descriptor `cover` from the subtitles fallback title.
* Refactor subtitle filtering
    The filter is now prepared in `_init` to prevent parsing the same config parameter for every item.
    The `_extract_subtitles` function will still extract if either filter (source or language) matches.
* Generate a `file_id` for subtitles
    Subtitles have multiple fields that determine the unique file, so these are simply concatenated.
    This is similar to the cover types, only with more variations.
* Added tests for subtitles
* fix docs entries
* fix '"covers": "all"'
* simplify some code
* Fix fallback title for subtitles
    Added the missing "f" to the f-string and added "subtitle" to the title.
    The resulting title will look like "TikTok video subtitle #1234567"
2026-01-30 21:01:06 +01:00
CasualYouTuber31
2d01fef300 [tiktok] Restructure to allow user extractors to provide their own rehydration data (#8848) 2026-01-30 15:18:56 +01:00
Mike Fährmann
3445c51ca4 [job] add 'output.jsonl' option (#8953) 2026-01-30 09:36:28 +01:00
Mike Fährmann
532ab7112e [discord] add 'server-search' extractor
requested on Discord

https://discord.com/channels/SERVER_ID/search?from=USER_ID
2026-01-30 07:58:14 +01:00
Mike Fährmann
690b3ba200 [civitai:user-posts] fix pagination (#8955)
fix '400 Bad Request' errors when retrieving
more than the first batch of posts.
2026-01-29 18:53:08 +01:00
Mike Fährmann
56168fbc87 [weebdex] add 'lang' option, support query params (#8957)
for example '?order=asc&group=j0fsj3oem3&tlang=en'
2026-01-29 17:01:02 +01:00
Mike Fährmann
a3f164aa50 [weebdex] make metadata extraction non-fatal no2 (#8954)
9a102039fc
2026-01-28 19:48:38 +01:00
Mike Fährmann
feef91bf09 [exhentai] implement Multi-Page Viewer support (#2616 #5268) 2026-01-28 19:37:40 +01:00
Mike Fährmann
d9917ec630 [xenforo] improve 'attachment' extraction (#8947) 2026-01-28 11:57:17 +01:00
Mike Fährmann
aa8610c11c [imhentai] prevent exceptions for galleries without image data (#8951) 2026-01-28 10:40:22 +01:00
Mike Fährmann
6c9dff1e29 [docs/options] add Table of Contents 2026-01-27 19:27:40 +01:00
SubmarineScurvy
ef8f2869e7 [listal] add 'image' & 'people' extractors (#1589 #8921)
* listal extractor
* add listal to init
* fix flake8 & formatting & extractor names/subcategories

* remove 're' import
* remove 'datetime' import
* update & simplify extractors
* update supportedsites
* add tests

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2026-01-27 18:26:41 +01:00
Mike Fährmann
eaaa25b6e4 [job] enable all 'parent-…' options for parent extractors by default
- parent-directory
- parent-metadata
- parent-session
- parent-skip

- add general 'parent' option
2026-01-27 12:05:19 +01:00
Mike Fährmann
250fbd3294 [erome] mark as 'parent' extractor 2026-01-27 11:13:14 +01:00
Mike Fährmann
b67e3c15ff [xenforo] support 'titsintops.com' (#8945) 2026-01-27 10:31:26 +01:00
Mike Fährmann
105e2379d4 [pornhub] fix '400 Bad Request' when logged in (#8942)
extract 'token' from a different location
2026-01-27 10:04:24 +01:00
Mike Fährmann
f6ce8c8579 [mangataro] fix 'manga' extractor (#8930) 2026-01-27 10:03:33 +01:00
CasualYouTuber31
4fab8e0dd8 [tiktok] do not fail story extraction if user has no stories (#8938) 2026-01-26 16:50:50 +01:00
Mike Fährmann
9a102039fc [weebdex] make metadata extraction non-fatal (#8939) 2026-01-26 16:44:29 +01:00
Mike Fährmann
7784aed74e [kemono] prevent 'revisions' API requests when possible
posts from '/v1/{service}/user/{creator_id}/post/{post_id}' already
include their revisions and don't need an additional API request
2026-01-26 10:00:32 +01:00
Mike Fährmann
7ac9ad1cbf [kemono] fix possible 'AttributeError' for revisions (#8929)
some revisions have string values for 'file' and 'attachments'
instead of the regular dicts
2026-01-26 10:00:32 +01:00
CasualYouTuber31
702814654a [tiktok] solve JS challenges (#8850)
* [tiktok] First draft of a challenge resolver
* use stdlib sha256 implementation
* simplify 'resolve_challenge()' code
* set cookie domain and expires timestamp
* base64 -> binascii
* Avoid incorrect padding exceptions

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2026-01-26 09:55:53 +01:00
CasualYouTuber31
d19d5c8b6e [tiktok] extract more story item list pages (#8932)
* [tiktok] extractor more story item list pages
* [tiktok] invert hasMore logic
2026-01-26 08:57:23 +01:00
CasualYouTuber31
f80e294132 [tiktok] Fix account extraction (#8931)
Was inadvertently caused by recent changes to range predicates
Fixes regression introduced in c23beee57c
2026-01-26 08:54:42 +01:00
Mike Fährmann
93bf4ccc18 merge #8928: [mangafreak] add support 2026-01-25 19:52:34 +01:00
Mike Fährmann
4e71e2f7e7 [mangafreak] update & fix
- fix manga and title extraction
- fix 'chapter_minor'
- extend test results
2026-01-25 19:49:56 +01:00
Mike Fährmann
7026611f31 merge #8925: [mangatown] add support 2026-01-25 18:35:39 +01:00
Mike Fährmann
bf3ee5e9f7 [mangatown] fix & update
- use BASE_PATTERN
- fix manga, manga_id, chapter_id extraction
- fix & extend 'manga' metadata results
- extend test results
2026-01-25 18:32:17 +01:00
Duy Nguyen
58662f900a fix(mangafreak): fix image extraction and simplify code
- Fix image URL extraction pattern to handle img tags with id attribute
- Use self.groups pattern instead of custom __init__ methods
- Fix chapter list extraction to use correct table structure
2026-01-25 17:24:05 +01:00
Duy Nguyen
8b0e8c656d feat(mangafreak): add support for MangaFreak
Add chapter and manga extractors for ww2.mangafreak.me with support
for bonus chapters (e.g., 167e suffix).
2026-01-25 15:56:52 +01:00
Duy Nguyen
befa9b8a3e [mangatown] fix base url and simplify image extraction 2026-01-25 11:40:15 +01:00
Mike Fährmann
adca123646 [weibo:user] add 'subalbums' include (#8792) 2026-01-25 11:16:41 +01:00
Mike Fährmann
cd83be41c5 [common] allow Dispatch 'alt' extractors to use custom URLs 2026-01-25 11:15:30 +01:00
Mike Fährmann
37176da511 [hentaifoundry:user] use f-strings 2026-01-25 10:10:37 +01:00