Commit Graph

415 Commits

Author SHA1 Message Date
Mike Fährmann
dcaf7293b3 [bluesky] add 'video' extractor (#4438) 2025-04-16 12:00:57 +02:00
Mike Fährmann
f7cd4367c6 [chevereto] support 'imagepond.net' (#7278) 2025-04-01 10:41:54 +02:00
Mike Fährmann
015ba76c9c [webtoons] add 'artist' extractor (#7274) 2025-04-01 10:06:56 +02:00
Mike Fährmann
24bbcbcfa3 [danbooru] add 'favgroup' extractor 2025-03-26 20:58:49 +01:00
Mike Fährmann
7a6899c647 [imhentai] support 'hentaienvy.com' and 'hentaizap.com' (#7192 #7218)
and move 'hentaifox' support to this module as well
2025-03-24 15:33:19 +01:00
Mike Fährmann
31e57bafab [arcalive] add 'user' extractor (#5657) 2025-03-14 18:58:10 +01:00
hdk5
d900e868e4 [arcalive] add support (#5657 #7100)
* [arca.live] Add extractor skeleton

* [arcalive] update names and formatting

* [arcalive] implement initial file extraction code

* [arcalive] improve '_extract_media()' performance

compile and cache regex on demand

* [arcalive] improve image extraction

- extract 'data-originalurl' URLs if available
- replace URL query strings with 'type=orig'
- ignore emoticons by default

* [arcalive] update defaults

- include 'title' in filenames
- use 0.5-1.5s delay between requests

* [arcalive] use ext from 'data-orig' if available

* [arcalive] update docs/supportedsites

* [arcalive] add tests

* [arcalive] update 'board' extractor pattern

so it doesn't also match 'post' URLs

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2025-03-14 10:52:21 +01:00
Deer-Spangle
859f1e7d04 [furaffinity] Adding a FuraffinityFolderExtractor, which extracts a single folder
- Ensure FuraffinityGalleryExtractor doesn't detect folder links
- Fix example URL for folder extractor
- Reordering classes a bit
- Another tweak of the regex
- One more go at the regex..
- cleanup
2025-03-12 14:00:50 +01:00
Mike Fährmann
f5073605f6 [tenor] add 'user' extractor (#6075) 2025-03-04 21:47:16 +01:00
Mike Fährmann
2f3265a8ae [tenor] add initial support (#6075) 2025-03-03 19:04:50 +01:00
Mike Fährmann
fa7114ee20 [docs] update supportedsites 2025-02-28 10:48:28 +01:00
CasualYouTuber31
daac2c6e04 [tiktok] add support (#3061 #4177 #5646 #6878 #6708)
* Add TikTok photo support

#3061
#4177

* Address linting errors

* Fix more test failures

* Forgot to update category names in tests

* Looking into re issue

* Follow default yt-dlp output template

* Fix format string error on 3.5

* Support downloading videos and audio

Respond to comments
Improve archiving and file naming

* Forgot to update supportedsites.md

* Support user profiles

* Fix indentation

* Prevent matching with more than one TikTok extractor

* Fix TikTok regex

* Support TikTok profile avatars

* Fix supportedsites.md

* TikTok: Ignore no formats error

In my limited experience, this doesn't mean that gallery-dl can't download the photo post (but this could mean that you can't download the audio)

* Fix error reporting message

* TikTok: Support more URL formats

vt.tiktok.com
www.tiktok.com/t/

* TikTok: Only download avatar when extracting user profile

* TikTok: Document profile avatar limitation

* TikTok: Add support for www.tiktokv.com/share links

* Address Share -> Sharepost issue

* TikTok: Export post's creation date in JSON (ISO 8601)

* [tiktok] update

* [tiktok] update 'vmpost' handling

just perform a HEAD request and handle its response

* [tiktok] build URLs from post IDs

instead of reusing unchanged input URLs

* [tiktok] combine 'post' and 'sharepost' extractors

* [tiktok] update default filenames

put 'id' and 'num' first to ensure better file order

* [tiktok] improve ytdl usage

- speed up extraction by passing '"extract_flat": True'
- pass more user options and cookies
- pre-define 'TikTokUser' extractor usage

* [tiktok] Add _COOKIES entry to AUTH_MAP

* [tiktok] Always download user avatars

* [tiktok] Add more documentation to supportedsites.md

* [tiktok] Address review comments

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2025-02-25 20:10:48 +01:00
Mike Fährmann
a9853cd273 merge #6781: [bilibili] add 'user-articles-favorite' extractor (#6725) 2025-02-23 18:19:51 +01:00
mmmpipi
e4cc3419c5 add bilibili User Articles FavList support
- fix whitespace
- fix extractor names
- Add favlist url user check
- apply changes
- add test
- update docs/supportedsites
2025-02-23 18:18:45 +01:00
Dominik Prange
ff5f6fe70f [boosty] added new direct message extractor
- formatting
- fixed linting formatting errors
- fixed E999 SyntaxError: invalid syntax
- fixed class naming
- fixed mandatory extractor.boosty.metadata as true requirement
- update
  - apply changes
  - add test
  - update docs/supportedsites
- improve 'dialog' pagination logic
2025-02-23 18:14:59 +01:00
Mike Fährmann
52d4e1a100 [imhentai] inherit from BaseExtractor
combine all imhentai-like sites into one module
2025-02-19 22:14:52 +01:00
Mike Fährmann
d4c56b08d7 [hentaiera] add support (#3046 #6952 #7020) 2025-02-19 17:42:04 +01:00
Mike Fährmann
4396029d36 [furry34] add support (#1078 #7018) 2025-02-19 16:35:48 +01:00
Mike Fährmann
82493a6672 [hentairox] add support (#7003) 2025-02-18 21:45:30 +01:00
Luca Russo
95c446fcd1 [discord] add support (#6836)
* first commit

* add --

* skip video embeds

* fix typo

* removed ambiguity

* add category support

* code tweaks

* more reliable embed extraction

* handle 403 errors (testing done)

* added "parent_id" keyword

* added "parent", "parent_type" keywords

the extractor should be now ready to merge!

* removed unnecessary dict unpacking

* added empty text messages extraction

* added "channel_topic"

* even more metadata extraction

can now extract all embeds images & text, as well as server banners. also code is much better.

* added user avatar and banner

* better pagination

* fix regression

* minor tweaks

* Made requested changes
2025-02-18 18:45:39 +01:00
Mike Fährmann
7ae09c6b29 [imgur] add support for (hidden) personal posts (#6990)
https://imgur.com/user/me
https://imgur.com/user/me/hidden
2025-02-14 19:28:55 +01:00
Mike Fährmann
f1f27eb2ab [vsco] support '/video/' URLs (#4295 #6973)
requires yt-dlp/youtube-dl to handle m3u8 manifests
2025-02-12 19:12:00 +01:00
Mike Fährmann
55034d9638 [imhentai] add support (#1660 #3046 #3824 #4338 #5936) 2025-02-10 21:42:07 +01:00
NecRaul
dae82f1519 [b4k] update domain to arch.b4k.dev 2025-02-09 01:28:23 +04:00
Mike Fährmann
83e50e43a8 [hiperdex] update domain to 'hiperdex.com' 2025-01-26 19:26:03 +01:00
Mike Fährmann
254ffd3fcd [shimmie2] remove 'tentaclerape.net'
"Site Not Found"
2025-01-26 17:02:07 +01:00
Mike Fährmann
d2164af63d [komikcast] update domain to 'komikcast.la' 2025-01-26 16:54:14 +01:00
Mike Fährmann
804fd048ef [szurubooru] remove 'booru.foalcon.com'
DNS record of foalcon.com no longer exists
2025-01-26 16:42:49 +01:00
Mike Fährmann
b271a874ed [fanleaks] remove module
DNS record of fanleaks.club no longer exists
2025-01-26 16:35:46 +01:00
Mike Fährmann
05fa6dd354 [nekohouse] add initial support (#5241, #6738) 2025-01-20 20:15:34 +01:00
Mike Fährmann
f867e690c1 merge #6855: [turboimagehost] add support for galleries 2025-01-19 17:51:48 +01:00
arebokert
556fbb1a44 [turboimagehost] add support for galleries
- added support
- raise error if gallery not found
- fix test
- fix lint issues
- simplify
2025-01-19 17:28:45 +01:00
Mike Fährmann
438c61601b [xfolio] add initial support (#5514, #6351, #6837) 2025-01-18 15:57:56 +01:00
Mike Fährmann
6e919a3695 [e621] support e621.cc and e621.anthro.fr frontend URLs (#6809) 2025-01-15 14:35:37 +01:00
Mike Fährmann
bde99cc6ce [cohost] remove module
cohost.org  now redirects to archive.org
2025-01-13 14:38:35 +01:00
Mike Fährmann
91bd3e37f2 [pexels] add support (#2286, #4214, #6769) 2025-01-12 16:50:12 +01:00
Mike Fährmann
1d75c8308c [weebcentral] add support (#6778) 2025-01-10 23:04:51 +01:00
Mike Fährmann
167a726972 [szurubooru] support 'visuabusters.com/booru' (#6729) 2024-12-26 19:04:16 +01:00
Mike Fährmann
998f949db1 [civitai] add 'user-videos' extractor (#6644) 2024-12-26 10:18:54 +01:00
Mike Fährmann
63008f77e2 merge #6607: [lofter] add initial support
(#650, #2294, #4095, #4728, #5656)
2024-12-11 20:41:52 +01:00
Mike Fährmann
717081dabd [lofter] update
- add tests
- update docs/supportedsites
- provide 'date' metadata
- simplify/restructure some code
2024-12-11 20:39:01 +01:00
Mike Fährmann
0e942f0829 merge #6613: [itaku] add 'search' extractor 2024-12-11 11:54:33 +01:00
Mike Fährmann
b58af14bdb [itaku] update
- simplify code
- update docs/supportedsites
- update test results
2024-12-11 11:52:42 +01:00
Mike Fährmann
86334f9c4a [yiffverse] add support (#6611) 2024-12-11 10:57:21 +01:00
Mike Fährmann
47311352de [cyberdrop] add extractor for media URLs (#2496)
https://github.com/mikf/gallery-dl/issues/2496#issuecomment-2495467133
2024-12-08 20:57:12 +01:00
Mike Fährmann
ef7ff31117 [realbooru] fix extraction (#6543)
- extract data from HTML pages since API is no longer usable
- move code into its own separate 'realbooru' module
2024-12-07 17:39:25 +01:00
Mike Fährmann
624dc7f407 [bluesky] add 'info' extractor 2024-12-05 08:36:33 +01:00
Mike Fährmann
d96717e2e6 [hentaicosplays] update domains (#6578)
inherit from BaseExtractor to make differentiating between sites easier
2024-12-03 13:56:32 +01:00
Luca Russo
e9370b7b8a merge #5626: [facebook] add support (#470, #2612)
* [facebook] add initial support

* renamed extractors & subcategories

* better stability, modularity & naming

* added single photo extractor, warnings & retries

* more metadata + extract author followups

* renamed "album" mentions to "set" for consistency

* cookies are now only used when necessary

also added author followups for singular images

* removed f-strings

* added way to continue extraction from where it left off

also fixed some bugs

* fixed bug wrong subcategory

* added individual video extraction

* extract audio + added ytdl option

* updated setextract regex

* added option to disable start warning

the extractor should be ready :)

* fixed description metadata bug

* removed cookie "safeguard" + fixed for private profiles

I have removed the cookie "safeguard" (not using cookies until they are necessary) as I've come to the conclusion that it does more harm than good. There is no way to detect whether the extractor has skipped private images, that could have been possibly extracted otherwise. Also, doing this provides little to no advantages.

* fixed a few bugs regarding profile parsing

* a few bugfixes

Fixed some metadata attributes from not decoding correctly from non-latin languages, or not showing at all.
Also improved few patterns.

* retrigger checks

* Final cleanups

-Added tests
-Fixed video extractor giving incorrect URLs
-Removed start warning
-Listed supported site correctly

* fixed regex

* trigger checks

* fixed livestream playback extraction + bugfixes

I've chosen to remove the "reactions", "comments" and "views" attributes as I've felt that they require additional maintenance even though nobody would ever actually use them to order files.
I've also removed the "title" and "caption" video attributes for their inconsistency across different videos.
Feel free to share your thoughts.

* fixed regex

* fixed filename fallback

* fixed retrying when a photo url is not found

* fixed end line

* post url fix + better naming

* fix posts

* fixed tests

* added profile.php url

* made most of the requested changes

* flake

* archive: false

* removed unnecessary url extract

* [facebook] update

- more 'Sec-Fetch-…' headers
- simplify 'text.nameext_from_url()' calls
- replace 'sorted(…)[-1]' with 'max(…)'
- fix '_interval_429' usage
- use replacement fields in logging messages

* [facebook] update URL patterns

get rid of '.*' and '.*?'

* added few remaining tests

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2024-11-26 21:49:11 +01:00
Mike Fährmann
b78c35fd15 [motherless] add 'media' and 'gallery' extractors
(#2074, #4413, #6221)
2024-11-22 21:06:32 +01:00