Commit Graph

404 Commits

Author SHA1 Message Date
enduser420
f77e98b57d [chzzk] add 'comment' and 'community' extractors (#7735 #7741)
* [chzzk] add 'comment' and 'community' extractors
* [chzzk] update
* [chzzk] add tests
* [chzzk] update docs/supportedsites
* [chzzk] add 'offset' option
* [docs] add 'offset' option to gallery-dl.conf
2025-06-28 15:27:19 +02:00
Mike Fährmann
578aea51ed [comick] add initial support (#1825 #6782) 2025-06-24 18:59:50 +02:00
Mike Fährmann
68960e29a1 [dankefuerslesen] add support (#7669) 2025-06-22 12:13:12 +02:00
SpiffyChatterbox
e0f65be36b [nudostar] add support (#5735 #6556)
* Drafting initial basic extractor layout
* Better debug logging
* Update nudostar.py
    Still tinkering
* Update nudostar.py
    Basic extractor is working. Now starting on Gallery
* Update nudostar.py
    Still a work in progress.
    Got individual posts working, galleries are not.
* Update nudostar.py
* Site now appears working. Added Tests.
* PEP Updates
* PEP - Line Length Updates
* Update nudostar.py
    Resolving PEP8 issues.
* update 'gallery' extractor, rename to 'model'
* update 'image' extractor
* expand tests
* update docs/supportedsites

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2025-06-16 19:21:49 +02:00
missionfloyd
13cb031fe2 [girlsreleased] add support (#6200)
* [girlsreleased] add support
* Lint
* Change "galleries" to "sets"
    As it is on the site
* Add pagination
    Add tests
* Fix tests
* Remove leftover print()
* Don't remove first set
* Yield pages
* Add filename metadata
* [girlsreleased] Refactor
* Return models as array
* Add filename numbering
    Add date metadata
* Add URL metadata
* Spawn set extractor the right way
* Adjust model/site regex
* update
    - restructure some code
    - remove constructors
    - use f-strings
* expand tests
* update docs/supportedsites
2025-06-16 19:18:19 +02:00
SpiffyChatterbox
48ac41605d [redbust] add support (#6759 #6918 #7043)
* init - Redbust.com Support
* Added Test
    Could use a second set of eyes on this
* update 'gallery' extractor
    - extract more metadata
    - simplify image extraction
    - support legacy galleries
* add tests
* update 'image' extractor
* add 'tag' extractor
* add 'archive' extractor
* restrict 'image' extractor pattern
* update docs/supportedsites
* replace quotes inside f-string

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2025-06-16 12:10:42 +02:00
hunter-gatherer8
96f5cfb305 [girlswithmuscle] add support (#4493 #6016)
* [girlswithmuscle] init
* [girlswithmuscle]: fix metadata extraction (site layout change)
* [girlswithmuscle]: fix tags extraction (site layout change)
* update login code
* update 'post' extractor
* update 'gallery' extractor, rename to 'search' extractor
* update docs
* add test cases

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2025-06-14 23:05:49 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
thatDudo
0b0152b347 [rawkuma] add support (#4571)
* Add rawkuma extractor

* Fix flake8 warnings

* Remove fstring

* Fix regex call

* update domain to rawkuma.net

* fix 'manga' extractor

* fix 'chapter' extractor

* add tests

* update docs/supportedsites

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2025-06-02 19:15:53 +02:00
Mike Fährmann
ec523c2c2c [mangasee] remove module 2025-05-30 18:04:55 +02:00
Mike Fährmann
922c296482 [kemono][coomer][schalenetwork] rename modules & extractors
category changes:

- kemonoparty -> kemono
- coomerparty -> coomer
- koharu      -> schalenetwork

also wanted to rename '2chan' -> 'sturdychan',
but the site's main page is still titled '2chen'
2025-05-30 17:51:49 +02:00
Mike Fährmann
7b2bcf68a5 [manganelo] support 'nelomanga.net' and mirror domains (#7423)
- natomanga.com
- nelomanga.net
- manganato.gg
- mangakakalot.gg
2025-04-29 21:12:37 +02:00
Mike Fährmann
29a4444b21 [pictoa] update
- simplify code
- update URL patterns
- update tests
- update docs/supportedsites
2025-04-24 17:59:47 +02:00
nunonda
f342108280 Adding in a first pass at a pictoa extractor
Adds support for galleries and individual Images
2025-04-23 17:37:40 -07:00
Mike Fährmann
4c8c98a14d use internal, non-caching version of re.compile for extractor patterns
speeds up total compile time of extractor patterns by ~10ms
2025-04-15 22:47:19 +02:00
Mike Fährmann
7a6899c647 [imhentai] support 'hentaienvy.com' and 'hentaizap.com' (#7192 #7218)
and move 'hentaifox' support to this module as well
2025-03-24 15:33:19 +01:00
hdk5
d900e868e4 [arcalive] add support (#5657 #7100)
* [arca.live] Add extractor skeleton

* [arcalive] update names and formatting

* [arcalive] implement initial file extraction code

* [arcalive] improve '_extract_media()' performance

compile and cache regex on demand

* [arcalive] improve image extraction

- extract 'data-originalurl' URLs if available
- replace URL query strings with 'type=orig'
- ignore emoticons by default

* [arcalive] update defaults

- include 'title' in filenames
- use 0.5-1.5s delay between requests

* [arcalive] use ext from 'data-orig' if available

* [arcalive] update docs/supportedsites

* [arcalive] add tests

* [arcalive] update 'board' extractor pattern

so it doesn't also match 'post' URLs

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2025-03-14 10:52:21 +01:00
Mike Fährmann
2f3265a8ae [tenor] add initial support (#6075) 2025-03-03 19:04:50 +01:00
CasualYouTuber31
daac2c6e04 [tiktok] add support (#3061 #4177 #5646 #6878 #6708)
* Add TikTok photo support

#3061
#4177

* Address linting errors

* Fix more test failures

* Forgot to update category names in tests

* Looking into re issue

* Follow default yt-dlp output template

* Fix format string error on 3.5

* Support downloading videos and audio

Respond to comments
Improve archiving and file naming

* Forgot to update supportedsites.md

* Support user profiles

* Fix indentation

* Prevent matching with more than one TikTok extractor

* Fix TikTok regex

* Support TikTok profile avatars

* Fix supportedsites.md

* TikTok: Ignore no formats error

In my limited experience, this doesn't mean that gallery-dl can't download the photo post (but this could mean that you can't download the audio)

* Fix error reporting message

* TikTok: Support more URL formats

vt.tiktok.com
www.tiktok.com/t/

* TikTok: Only download avatar when extracting user profile

* TikTok: Document profile avatar limitation

* TikTok: Add support for www.tiktokv.com/share links

* Address Share -> Sharepost issue

* TikTok: Export post's creation date in JSON (ISO 8601)

* [tiktok] update

* [tiktok] update 'vmpost' handling

just perform a HEAD request and handle its response

* [tiktok] build URLs from post IDs

instead of reusing unchanged input URLs

* [tiktok] combine 'post' and 'sharepost' extractors

* [tiktok] update default filenames

put 'id' and 'num' first to ensure better file order

* [tiktok] improve ytdl usage

- speed up extraction by passing '"extract_flat": True'
- pass more user options and cookies
- pre-define 'TikTokUser' extractor usage

* [tiktok] Add _COOKIES entry to AUTH_MAP

* [tiktok] Always download user avatars

* [tiktok] Add more documentation to supportedsites.md

* [tiktok] Address review comments

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2025-02-25 20:10:48 +01:00
Mike Fährmann
52d4e1a100 [imhentai] inherit from BaseExtractor
combine all imhentai-like sites into one module
2025-02-19 22:14:52 +01:00
Mike Fährmann
d4c56b08d7 [hentaiera] add support (#3046 #6952 #7020) 2025-02-19 17:42:04 +01:00
Mike Fährmann
4396029d36 [furry34] add support (#1078 #7018) 2025-02-19 16:35:48 +01:00
Mike Fährmann
82493a6672 [hentairox] add support (#7003) 2025-02-18 21:45:30 +01:00
Luca Russo
95c446fcd1 [discord] add support (#6836)
* first commit

* add --

* skip video embeds

* fix typo

* removed ambiguity

* add category support

* code tweaks

* more reliable embed extraction

* handle 403 errors (testing done)

* added "parent_id" keyword

* added "parent", "parent_type" keywords

the extractor should be now ready to merge!

* removed unnecessary dict unpacking

* added empty text messages extraction

* added "channel_topic"

* even more metadata extraction

can now extract all embeds images & text, as well as server banners. also code is much better.

* added user avatar and banner

* better pagination

* fix regression

* minor tweaks

* Made requested changes
2025-02-18 18:45:39 +01:00
Mike Fährmann
55034d9638 [imhentai] add support (#1660 #3046 #3824 #4338 #5936) 2025-02-10 21:42:07 +01:00
Mike Fährmann
b271a874ed [fanleaks] remove module
DNS record of fanleaks.club no longer exists
2025-01-26 16:35:46 +01:00
Mike Fährmann
05fa6dd354 [nekohouse] add initial support (#5241, #6738) 2025-01-20 20:15:34 +01:00
Mike Fährmann
438c61601b [xfolio] add initial support (#5514, #6351, #6837) 2025-01-18 15:57:56 +01:00
Mike Fährmann
bde99cc6ce [cohost] remove module
cohost.org  now redirects to archive.org
2025-01-13 14:38:35 +01:00
Mike Fährmann
91bd3e37f2 [pexels] add support (#2286, #4214, #6769) 2025-01-12 16:50:12 +01:00
Mike Fährmann
1d75c8308c [weebcentral] add support (#6778) 2025-01-10 23:04:51 +01:00
Mike Fährmann
63008f77e2 merge #6607: [lofter] add initial support
(#650, #2294, #4095, #4728, #5656)
2024-12-11 20:41:52 +01:00
Mike Fährmann
86334f9c4a [yiffverse] add support (#6611) 2024-12-11 10:57:21 +01:00
hdk5
0466fcab4c [lofter]: add initial support 2024-12-08 19:37:42 +02:00
Mike Fährmann
ef7ff31117 [realbooru] fix extraction (#6543)
- extract data from HTML pages since API is no longer usable
- move code into its own separate 'realbooru' module
2024-12-07 17:39:25 +01:00
Luca Russo
e9370b7b8a merge #5626: [facebook] add support (#470, #2612)
* [facebook] add initial support

* renamed extractors & subcategories

* better stability, modularity & naming

* added single photo extractor, warnings & retries

* more metadata + extract author followups

* renamed "album" mentions to "set" for consistency

* cookies are now only used when necessary

also added author followups for singular images

* removed f-strings

* added way to continue extraction from where it left off

also fixed some bugs

* fixed bug wrong subcategory

* added individual video extraction

* extract audio + added ytdl option

* updated setextract regex

* added option to disable start warning

the extractor should be ready :)

* fixed description metadata bug

* removed cookie "safeguard" + fixed for private profiles

I have removed the cookie "safeguard" (not using cookies until they are necessary) as I've come to the conclusion that it does more harm than good. There is no way to detect whether the extractor has skipped private images, that could have been possibly extracted otherwise. Also, doing this provides little to no advantages.

* fixed a few bugs regarding profile parsing

* a few bugfixes

Fixed some metadata attributes from not decoding correctly from non-latin languages, or not showing at all.
Also improved few patterns.

* retrigger checks

* Final cleanups

-Added tests
-Fixed video extractor giving incorrect URLs
-Removed start warning
-Listed supported site correctly

* fixed regex

* trigger checks

* fixed livestream playback extraction + bugfixes

I've chosen to remove the "reactions", "comments" and "views" attributes as I've felt that they require additional maintenance even though nobody would ever actually use them to order files.
I've also removed the "title" and "caption" video attributes for their inconsistency across different videos.
Feel free to share your thoughts.

* fixed regex

* fixed filename fallback

* fixed retrying when a photo url is not found

* fixed end line

* post url fix + better naming

* fix posts

* fixed tests

* added profile.php url

* made most of the requested changes

* flake

* archive: false

* removed unnecessary url extract

* [facebook] update

- more 'Sec-Fetch-…' headers
- simplify 'text.nameext_from_url()' calls
- replace 'sorted(…)[-1]' with 'max(…)'
- fix '_interval_429' usage
- use replacement fields in logging messages

* [facebook] update URL patterns

get rid of '.*' and '.*?'

* added few remaining tests

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2024-11-26 21:49:11 +01:00
Mike Fährmann
d1ad97ae0c [motherless] add to 'modules' list 2024-11-22 21:18:13 +01:00
hdk5
6eef3e3495 [bilibili] initial support (#2824) 2024-11-10 00:21:27 +02:00
Mike Fährmann
cb0d8cae77 merge #6227: [everia] add support (#1067, #2472, #4091) 2024-11-03 17:52:17 +01:00
missionfloyd
d31a3b5da3 [everia.club] Add support
- Unescape title and URL
- Add tags and categories metadata
    Lookup tag id with API instead of downloading tag page
- Add category extractor
- Add tests
- Rename EveriaExtractor to EveriaPostExtractor
- Fix EveriaPostExtractor example
- Lookup tags/categories by post id
- Add date extractor
- Remove leftover pages parameter
- Add error handling for invalid dates.
- Add filename numbering
    Parse date
- Rename extract() to images()
- Remove html import
- Fix search/date URLs with page number
- Fix tag/category search
- Fix post extractor
- Fix tag, category extractors
- Fix search extractor
- Only load first page once
- Fix date extractor
- Fix tests
- Clean up search extractor
2024-11-03 14:09:07 +01:00
Mike Fährmann
d787c0c4ea [rule34xyz] add support (#1078, #4960) 2024-11-03 10:12:26 +01:00
Mike Fährmann
655e42dc92 merge #6240: [rule34vault] add support (#5708) 2024-10-28 22:31:05 +01:00
ssdaniel24
3d0263b3ab [rule34vault] Added initial support for rule34vault.com
- Added playlists support for rule34vault
- Added support for posts in rule34vault
- Fixed supported sites with script
- Fixed posts pattern in rule34vault
- Added tests for rule34vault
- Clean
- Fixed lint warnings
2024-10-28 22:26:47 +01:00
Mike Fährmann
5de8576ff6 [noop] add 'noop' extractor 2024-10-28 19:45:24 +01:00
Mike Fährmann
10c076e7f2 [saint] add 'album' and 'media' extractors (#4405, #6324) 2024-10-27 22:27:30 +01:00
Mike Fährmann
66aa514c25 [scrolller] add initial support (#295, #3418, #5051) 2024-10-21 14:17:18 +02:00
Mike Fährmann
4a1cbe94a9 [pururin] remove module
"This domain name has been seized in accordance with a seizure warrant
 issued by the United States District Court for the District of Idaho"
2024-10-10 15:57:17 +02:00
Mike Fährmann
1ad58cab84 [boosty] add initial support (#2387) 2024-10-02 20:39:55 +02:00
Mike Fährmann
93eca64a73 [civitai] add initial support (#3706, #3787, #4129, #5995) 2024-09-20 17:21:17 +02:00
Mike Fährmann
638a676495 [ao3] add initial support (#6013) 2024-09-15 22:38:21 +02:00