Commit Graph

6519 Commits

Author SHA1 Message Date
Mike Fährmann
18ed39c1cf implement 'downloader' options per extractor category
by setting options inside 'http' or 'ytdl' inside extractor options
or inside subcategory options

{
    "extractor": {
        "mastodon": {
            "http": {
                "rate": "10k"
            }
        },
        "mastodon.social": {
            "http": {
                "rate": "100k"
            }
        }
    },
    "downloader": {
        "rate": "100m"
    }
}

Sets download speed to
-  10k for mastodon.social URLs
- 100k for mastodon sites in general
- 100m for all other sites
2025-02-22 10:08:59 +01:00
Mike Fährmann
4906541f7d [generic] fix config lookups by subcategory
'subcategory' needs to be set before Extractor.__init__() runs
to be included in '_cfgpath'
2025-02-22 10:08:59 +01:00
Mike Fährmann
79dc04d87c [subscribestar] fix 'post' extractor (#6582)
https://github.com/mikf/gallery-dl/issues/6582#issuecomment-2675939669
2025-02-22 10:08:59 +01:00
Mike Fährmann
57937c3b68 [newgrounds] provide 'comment_html' metadata (#7038) 2025-02-22 10:08:58 +01:00
Mike Fährmann
196aa263c2 [imhentai] improve pagination duplicate filtering 2025-02-20 20:53:35 +01:00
Mike Fährmann
52d4e1a100 [imhentai] inherit from BaseExtractor
combine all imhentai-like sites into one module
2025-02-19 22:14:52 +01:00
Mike Fährmann
7a11d02e7a [reddit] restrict subreddit search results (#7025) 2025-02-19 20:05:48 +01:00
Mike Fährmann
d4c56b08d7 [hentaiera] add support (#3046 #6952 #7020) 2025-02-19 17:42:04 +01:00
Mike Fährmann
4396029d36 [furry34] add support (#1078 #7018) 2025-02-19 16:35:48 +01:00
Mike Fährmann
67937d33e3 [archive] fix NameError when SQLite database path doesn't exist
fixes regression introduced in 841bc9f6
2025-02-18 22:09:07 +01:00
Mike Fährmann
82493a6672 [hentairox] add support (#7003) 2025-02-18 21:45:30 +01:00
Luca Russo
95c446fcd1 [discord] add support (#6836)
* first commit

* add --

* skip video embeds

* fix typo

* removed ambiguity

* add category support

* code tweaks

* more reliable embed extraction

* handle 403 errors (testing done)

* added "parent_id" keyword

* added "parent", "parent_type" keywords

the extractor should be now ready to merge!

* removed unnecessary dict unpacking

* added empty text messages extraction

* added "channel_topic"

* even more metadata extraction

can now extract all embeds images & text, as well as server banners. also code is much better.

* added user avatar and banner

* better pagination

* fix regression

* minor tweaks

* Made requested changes
2025-02-18 18:45:39 +01:00
Mike Fährmann
fd4de02e67 [archive] support PostgreSQL archives for post processors (#6152) 2025-02-17 14:58:14 +01:00
Mike Fährmann
8daf496a22 [archive] add 'archive-table' option (#6152) 2025-02-17 11:41:13 +01:00
Mike Fährmann
dac0c4ac10 [docs] add 'psycopg' to optional dependencies 2025-02-17 10:59:15 +01:00
Mike Fährmann
841bc9f66f [archive] implement support for PostgreSQL databases (#6152) 2025-02-16 17:56:52 +01:00
Mike Fährmann
b4eae65965 [imhentai] avoid unnecessary HTTP request
no need to fetch a gallery's '/view/' page when the main page contains
all the same data as well
2025-02-16 15:04:24 +01:00
Mike Fährmann
800cf5beb5 replace 'print()' with 'output.stderr_write("\n")' 2025-02-15 18:01:05 +01:00
Mike Fährmann
35307608f2 [dl:http] add 'sleep-429' option (#6996) 2025-02-15 17:42:03 +01:00
Mike Fährmann
046ebb5590 [imgur] replace AuthorizationError exception with logging message 2025-02-15 15:36:39 +01:00
Mike Fährmann
182b544217 [ytdl] support specifying filesystem paths as 'module' (#6991) 2025-02-14 19:58:25 +01:00
Mike Fährmann
7ae09c6b29 [imgur] add support for (hidden) personal posts (#6990)
https://imgur.com/user/me
https://imgur.com/user/me/hidden
2025-02-14 19:28:55 +01:00
Mike Fährmann
cd9fa1ef75 [bunkr] implement fast '--range' support (#6985) 2025-02-14 18:21:32 +01:00
Mike Fährmann
195b52284a [tests] move 'e621:frontend' tests into regular results/e621.py
having both e621.py and E621.py in the same directory messes with
Windows

6e919a3695 (commitcomment-152557303)
2025-02-14 17:44:14 +01:00
Mike Fährmann
51f978e027 [weibo] add 'movies' option (#6988)
disable 'movie' downloads by default
2025-02-13 18:01:20 +01:00
Mike Fährmann
b8b541fded [itaku] support gallery section URLs (#6951) 2025-02-13 14:29:36 +01:00
Mike Fährmann
1cf2870f81 [patreon] extract 'campaign' metadata (#6989) 2025-02-13 14:13:43 +01:00
Mike Fährmann
6420210b0f [vsco] improve 'm3u8' handling 2025-02-12 20:44:43 +01:00
Mike Fährmann
f1f27eb2ab [vsco] support '/video/' URLs (#4295 #6973)
requires yt-dlp/youtube-dl to handle m3u8 manifests
2025-02-12 19:12:00 +01:00
Mike Fährmann
d1a8142dcf [bunkr] provide fallback URLs for 403 download links (#6732 #6972) 2025-02-12 19:12:00 +01:00
Mike Fährmann
55034d9638 [imhentai] add support (#1660 #3046 #3824 #4338 #5936) 2025-02-10 21:42:07 +01:00
Mike Fährmann
be77465e1b [weebcentral] fix extracting wrong number of chapter pages (#6966)
When downloading multiple chapters at once, all chapters after the first
one would download only as many pages per chapter as the first one had,
due to reusing a cached/shared dict in the wrong way.
2025-02-10 16:00:13 +01:00
Mike Fährmann
587205bf68 [pixiv] prevent exceptions during 'comments' extraction (#6965)
- wrap in try-except block
- do not attempt to fetch comments for 'sanity_level' works
2025-02-10 09:56:09 +01:00
Mike Fährmann
6c2b6d50cc [patreon] support '/profile/creators' URLs 2025-02-09 15:52:54 +01:00
Mike Fährmann
3282025749 merge #6956: [b4k] update domain to 'arch.b4k.dev' (#6955) 2025-02-09 11:13:53 +01:00
Mike Fährmann
23c4bc8ac5 [b4k] keep support for previous 'arch.b4k.co' domain 2025-02-09 11:11:38 +01:00
NecRaul
dae82f1519 [b4k] update domain to arch.b4k.dev 2025-02-09 01:28:23 +04:00
Mike Fährmann
93adc86dca improve '\f' format string handling for --print
add a newline only for f-string / \fF format strings,
as it would break any of the others
2025-02-08 21:42:31 +01:00
Mike Fährmann
e2134b349d replace '\f' in --print arguments with form feed character
to make it easier to use special type format strings on command-line
(#6938)
2025-02-07 19:37:33 +01:00
Mike Fährmann
28385bec7a [bunkr] extract 'id_url' metadata (#6935)
and use it as 'id' alternative instead of 'name' in default archive IDs
2025-02-06 20:40:35 +01:00
Mike Fährmann
b9675ea764 [bunkr] update default archive ID format (#6935)
use 'name' when there is no proper 'id' value available.
2025-02-05 21:54:43 +01:00
Mike Fährmann
873cbf6b36 [docs] add more details to 'user-agent' and 'browser' docs (#6917) 2025-02-03 20:54:27 +01:00
Mike Fährmann
5807daa19a [issuu] unescape HTML entities 2025-02-02 18:33:18 +01:00
Mike Fährmann
6c9b20fe45 [philomena] download 'full' URLs (#6922)
'view_url' URLs sometimes result in 404 errors
2025-02-02 18:23:46 +01:00
Mike Fährmann
4ab9237f1d [philomena] fix 'date' values without UTC offset (#6921)
Some instances do not include a UTC offset or 'Z' in their datetime
values, e.g. 2024-03-14T13:46:46 compared to 2024-03-14T13:46:46Z
2025-02-02 16:32:28 +01:00
Mike Fährmann
1a9138f25e [aes] handle errors during 'Cryptodome' import (#6906) 2025-02-02 15:01:17 +01:00
Mike Fährmann
7c96c2368f [subscribestar] detect and handle redirects (#6916) 2025-02-01 21:03:24 +01:00
Mike Fährmann
52ac3a7802 [release] build 'gallery-dl.exe' on Python 3.13 (#6684)
and rename the former Python 3.8 version to 'gallery-dl_x86.exe'.

Currently building with PyInstaller, as I wasn't able to get py2exe to
work in this environment, but the startup times are noticeably longer.

Considering switching to nuitka, maybe even for all standalone builds.
2025-02-01 19:58:51 +01:00
Mike Fährmann
ddb2c4d69d [executables] fix SSLError when using HTTPAdapter (#6393)
always load certifi certificates instead of relying on
'load_default_certs()', which might load no certs at all
2025-01-31 20:36:41 +01:00
Mike Fährmann
463e123283 [twibooru] match URLs with 'www' subdomain (#6903) 2025-01-30 19:20:02 +01:00