Commit Graph

6141 Commits

Author SHA1 Message Date
Mike Fährmann
473ee5ff85 [recursive] add 'https://' to URLs if not present 2024-12-10 17:16:52 +01:00
Mike Fährmann
e8826ed3d4 [common] simplify HTTP error messages
[warning] HTTPSConnectionPool(host='domain.tld', port=443): Max retries
exceeded with url: /a.jpg (Caused by NameResolutionError("<urllib3.
connection.HTTPSConnection object at 0x7247fe436ea0>: Failed to resolve
'domain.tld' ([Errno -2] Name or service not known)")) (1/5)

->

[warning] NameResolutionError: Failed to resolve 'domain.tld'
([Errno -2] Name or service not known) (1/5)
2024-12-10 17:13:44 +01:00
Mike Fährmann
86f3f3f763 [common] detect DDoS-Guard challenge pages 2024-12-08 21:39:04 +01:00
Mike Fährmann
47311352de [cyberdrop] add extractor for media URLs (#2496)
https://github.com/mikf/gallery-dl/issues/2496#issuecomment-2495467133
2024-12-08 20:57:12 +01:00
Mike Fährmann
d7873b9eb7 release version 1.28.1 2024-12-07 18:03:48 +01:00
Mike Fährmann
939cf51b01 [danbooru] add missing ':' to 'md5' tag prefix 2024-12-07 17:39:25 +01:00
Mike Fährmann
ef7ff31117 [realbooru] fix extraction (#6543)
- extract data from HTML pages since API is no longer usable
- move code into its own separate 'realbooru' module
2024-12-07 17:39:25 +01:00
Mike Fährmann
fbb4b222ec [inkbunny] fix re-login loop (#6618) 2024-12-07 17:39:25 +01:00
Mike Fährmann
e1613fc0f4 [nhentai] select random file servers for download URLs (#6620)
i1, i2, i3, i4 instead of just i.nhentai.net
2024-12-07 17:39:25 +01:00
Mike Fährmann
7091904b20 [common] restore using environment proxies by default (#6553, #6609)
change 'proxy-env' default to 'true'
2024-12-07 17:38:44 +01:00
Mike Fährmann
34e157e166 [zerochan] download webp and gif files, add 'extensions' option (#6576) 2024-12-05 21:25:44 +01:00
Mike Fährmann
624dc7f407 [bluesky] add 'info' extractor 2024-12-05 08:36:33 +01:00
Mike Fährmann
a526a3d00d [patreon] add 'format-images' option (#6569) 2024-12-04 21:38:01 +01:00
Mike Fährmann
45ce0a2797 [instagram] handle empty 'carousel_media' entries (#6595) 2024-12-04 18:31:23 +01:00
Mike Fährmann
f33ca82ce7 [kemonoparty] fix 'o' query parameter handling (#6597)
fixes regression introduced in 74d855c6
2024-12-04 18:26:38 +01:00
Mike Fährmann
d96717e2e6 [hentaicosplays] update domains (#6578)
inherit from BaseExtractor to make differentiating between sites easier
2024-12-03 13:56:32 +01:00
Mike Fährmann
d9bbe3b3b3 [gofile] fix website token extraction (#6596) 2024-12-03 12:04:05 +01:00
Mike Fährmann
f967247716 [instagram] fix TypeError by ignoring empty 'items' (#6595) 2024-12-03 11:03:30 +01:00
Mike Fährmann
57f8227473 [common] improve handling of 'user-agent' settings (#6594)
improves 5412b22dae

ignore 'extractor.user-agent' only for extractors using a custom
'User-Agent' header
2024-12-03 10:55:41 +01:00
Mike Fährmann
26163db69d [readcomiconline] fix chapter extractor (#6070, #6335) 2024-12-03 10:54:58 +01:00
Mike Fährmann
63e042dec7 [e621] fix 'TypeError' when 'metadata' is enabled (#6587)
fixes regression introduced in 9184a564
2024-12-02 14:09:38 +01:00
Mike Fährmann
990907572a [pixiv] include user ID in failed AJAX warnings (#6581) 2024-12-01 20:44:17 +01:00
Mike Fährmann
4a5dfc7d76 [rule34] fix 'favorite' extraction (#6573) 2024-12-01 18:17:25 +01:00
Mike Fährmann
c5685efbf7 [pixiv:ranking] implement filtering results by 'content' (#6574) 2024-12-01 18:01:43 +01:00
Mike Fährmann
d29a0b779e [bluesky] unescape search queries (#6579) 2024-12-01 17:56:49 +01:00
Mike Fährmann
a4d6ba9709 [bluesky] ignore non-quote embeds (#6577)" 2024-12-01 17:54:14 +01:00
Mike Fährmann
1e013d1af6 [patreon] allow overriding default User-Agent header
continuation of 5412b22dae
2024-11-30 22:20:05 +01:00
Mike Fährmann
4cd9ce8b39 release version 1.28.0 2024-11-30 11:01:47 +01:00
Mike Fährmann
bc22c56c90 merge #6501: [docs] update gallery-dl.conf 2024-11-30 09:59:42 +01:00
Mike Fährmann
75c463bb18 [docs] update gallery-dl.conf
add simple script that compares configuration.rst and gallery-dl.conf
2024-11-30 09:58:11 +01:00
Mike Fährmann
d8cf381904 [archive] use defaults when 'prefix'/'format' are 'null' 2024-11-29 16:36:35 +01:00
Mike Fährmann
79fd3445ee [pixiv:ranking] add 'rank' metadata field (#6531) 2024-11-28 19:34:55 +01:00
Mike Fährmann
9e7d7a3bb3 merge #6548: [facebook] add more tests 2024-11-28 15:25:02 +01:00
Mike Fährmann
5cc9ca7199 [instagram] fix 'extensiob' of apparent '.webp' files (#6541)
Many '.webp' download URLs are actually '.jpg' files, which usually get
renamed by 'http.adjust-extensions'
2024-11-28 15:17:32 +01:00
Mike Fährmann
7c7b8a25c3 [kemonoparty] fix login / update favorites extractor (#6415) 2024-11-28 14:41:16 +01:00
Luca Russo
0e1d93dca3 update
Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2024-11-28 11:02:50 +01:00
Luca Russo
e36cfb73ff added more tests 2024-11-28 10:55:43 +01:00
Mike Fährmann
e1aa4a7162 [kemonoparty] support new discord channel URLs (#6542) 2024-11-27 15:19:27 +01:00
Mike Fährmann
2162fa7df2 [kemonoparty] fix 'comments' for posts without comments (#6415)
https://github.com/mikf/gallery-dl/issues/6415#issuecomment-2501966303
2024-11-26 23:23:39 +01:00
Mike Fährmann
74d855c693 [kemonoparty] update to new site layout / API endpoints
(#6415, #6503, #6528, #6530, #6536)

… at least for the most part. Favorites are still broken, but the rest
should be functional again.
2024-11-26 22:15:28 +01:00
Mike Fährmann
5412b22dae [common] allow overriding more default 'User-Agent' headers (#6496)
ignore 'extractor.user-agent' if it is the default useragent value
and an extractor wants to set its own custom value
2024-11-26 21:50:28 +01:00
Mike Fährmann
94c3a4dca5 [scrolller] ignore posts without 'mediaSources' (#5051)
https://github.com/mikf/gallery-dl/issues/5051#issuecomment-2498729103
fixes "KeyError - 'mediaSources'"
2024-11-26 21:50:28 +01:00
Luca Russo
e9370b7b8a merge #5626: [facebook] add support (#470, #2612)
* [facebook] add initial support

* renamed extractors & subcategories

* better stability, modularity & naming

* added single photo extractor, warnings & retries

* more metadata + extract author followups

* renamed "album" mentions to "set" for consistency

* cookies are now only used when necessary

also added author followups for singular images

* removed f-strings

* added way to continue extraction from where it left off

also fixed some bugs

* fixed bug wrong subcategory

* added individual video extraction

* extract audio + added ytdl option

* updated setextract regex

* added option to disable start warning

the extractor should be ready :)

* fixed description metadata bug

* removed cookie "safeguard" + fixed for private profiles

I have removed the cookie "safeguard" (not using cookies until they are necessary) as I've come to the conclusion that it does more harm than good. There is no way to detect whether the extractor has skipped private images, that could have been possibly extracted otherwise. Also, doing this provides little to no advantages.

* fixed a few bugs regarding profile parsing

* a few bugfixes

Fixed some metadata attributes from not decoding correctly from non-latin languages, or not showing at all.
Also improved few patterns.

* retrigger checks

* Final cleanups

-Added tests
-Fixed video extractor giving incorrect URLs
-Removed start warning
-Listed supported site correctly

* fixed regex

* trigger checks

* fixed livestream playback extraction + bugfixes

I've chosen to remove the "reactions", "comments" and "views" attributes as I've felt that they require additional maintenance even though nobody would ever actually use them to order files.
I've also removed the "title" and "caption" video attributes for their inconsistency across different videos.
Feel free to share your thoughts.

* fixed regex

* fixed filename fallback

* fixed retrying when a photo url is not found

* fixed end line

* post url fix + better naming

* fix posts

* fixed tests

* added profile.php url

* made most of the requested changes

* flake

* archive: false

* removed unnecessary url extract

* [facebook] update

- more 'Sec-Fetch-…' headers
- simplify 'text.nameext_from_url()' calls
- replace 'sorted(…)[-1]' with 'max(…)'
- fix '_interval_429' usage
- use replacement fields in logging messages

* [facebook] update URL patterns

get rid of '.*' and '.*?'

* added few remaining tests

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2024-11-26 21:49:11 +01:00
Mike Fährmann
d1ad97ae0c [motherless] add to 'modules' list 2024-11-22 21:18:13 +01:00
Mike Fährmann
b78c35fd15 [motherless] add 'media' and 'gallery' extractors
(#2074, #4413, #6221)
2024-11-22 21:06:32 +01:00
Mike Fährmann
b8b943fc38 [pinterest] update API headers (#6513)
'BoardFeed' requests fail without 'X-Pinterest-PWS-Handler'
2024-11-22 08:41:10 +01:00
Mike Fährmann
ededba8087 [steamgriddb] disable 'adjust-extensions' for fake-png files (#5274) 2024-11-20 19:19:23 +01:00
Mike Fährmann
74a8587798 merge #6493: [docs] update pip command for installing dev version
- recommend '--force-reinstall'
- remove '-I' and '--no-cache-dir'
2024-11-20 16:22:01 +01:00
fireattack
7d609b883e Remove -I for installing from the source in readme
- Update according to the comment from @mikf
- Follow rst syntax
- remove '--no-cache-dir'
2024-11-20 16:19:38 +01:00
Mike Fährmann
f47c0982a0 [bluesky] improve 'web' did handling 2024-11-20 16:05:05 +01:00