Commit Graph

707 Commits

Author SHA1 Message Date
Mike Fährmann
7f6a53c347 [cohost] add 'avatar' and 'background' options (#6656) 2024-12-14 20:16:28 +01:00
Mike Fährmann
94d7df186f [bluesky] default to /posts if reposts/quoted is enabled (#6583) 2024-12-13 22:24:37 +01:00
Mike Fährmann
7091904b20 [common] restore using environment proxies by default (#6553, #6609)
change 'proxy-env' default to 'true'
2024-12-07 17:38:44 +01:00
Mike Fährmann
34e157e166 [zerochan] download webp and gif files, add 'extensions' option (#6576) 2024-12-05 21:25:44 +01:00
Mike Fährmann
a526a3d00d [patreon] add 'format-images' option (#6569) 2024-12-04 21:38:01 +01:00
Luca Russo
e9370b7b8a merge #5626: [facebook] add support (#470, #2612)
* [facebook] add initial support

* renamed extractors & subcategories

* better stability, modularity & naming

* added single photo extractor, warnings & retries

* more metadata + extract author followups

* renamed "album" mentions to "set" for consistency

* cookies are now only used when necessary

also added author followups for singular images

* removed f-strings

* added way to continue extraction from where it left off

also fixed some bugs

* fixed bug wrong subcategory

* added individual video extraction

* extract audio + added ytdl option

* updated setextract regex

* added option to disable start warning

the extractor should be ready :)

* fixed description metadata bug

* removed cookie "safeguard" + fixed for private profiles

I have removed the cookie "safeguard" (not using cookies until they are necessary) as I've come to the conclusion that it does more harm than good. There is no way to detect whether the extractor has skipped private images, that could have been possibly extracted otherwise. Also, doing this provides little to no advantages.

* fixed a few bugs regarding profile parsing

* a few bugfixes

Fixed some metadata attributes from not decoding correctly from non-latin languages, or not showing at all.
Also improved few patterns.

* retrigger checks

* Final cleanups

-Added tests
-Fixed video extractor giving incorrect URLs
-Removed start warning
-Listed supported site correctly

* fixed regex

* trigger checks

* fixed livestream playback extraction + bugfixes

I've chosen to remove the "reactions", "comments" and "views" attributes as I've felt that they require additional maintenance even though nobody would ever actually use them to order files.
I've also removed the "title" and "caption" video attributes for their inconsistency across different videos.
Feel free to share your thoughts.

* fixed regex

* fixed filename fallback

* fixed retrying when a photo url is not found

* fixed end line

* post url fix + better naming

* fix posts

* fixed tests

* added profile.php url

* made most of the requested changes

* flake

* archive: false

* removed unnecessary url extract

* [facebook] update

- more 'Sec-Fetch-…' headers
- simplify 'text.nameext_from_url()' calls
- replace 'sorted(…)[-1]' with 'max(…)'
- fix '_interval_429' usage
- use replacement fields in logging messages

* [facebook] update URL patterns

get rid of '.*' and '.*?'

* added few remaining tests

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2024-11-26 21:49:11 +01:00
Mike Fährmann
cb09273670 [koharu] implement 'tags' option 2024-11-15 23:49:58 +01:00
Mike Fährmann
c82f3db098 [common] add 'proxy-env' option
(#6134, #6455)
disable using environment proxies by default
2024-11-15 18:03:56 +01:00
Mike Fährmann
e763efd36c [bilibili] add workarounds for getting rate-limited (#6443)
- set 3-6 second request_interval by default
- retry request after waiting 5 minutes
2024-11-14 23:06:26 +01:00
Mike Fährmann
0b99d9e6b9 [util] add "defaultdict" filters-environment
allows accessing undefined values without raising an exception,
but preserves other errors like TypeError, AttributeError, etc
2024-11-14 22:47:25 +01:00
Mike Fährmann
9afbe91f82 [rule34xyz] add 'format' option (#1078) 2024-11-05 15:45:52 +01:00
Mike Fährmann
b92edb4614 [boosty] update default video format list (#2387) 2024-10-31 20:55:32 +01:00
Mike Fährmann
f79e57b71e [dl:ytdl] change 'forward-cookies' default to 'true' (#6401, #6348)
revert dba87ca99e
2024-10-31 17:35:08 +01:00
Mike Fährmann
6693ae19e8 [civitai] add 'metadata' option (#6383) 2024-10-27 15:46:00 +01:00
Mike Fährmann
8f396cfc57 [bluesky] add 'quoted' option (#6323) 2024-10-25 17:22:33 +02:00
Mike Fährmann
b08da4ffc7 [reddit] add 'embeds' option (#6357) 2024-10-22 17:06:54 +02:00
Mike Fährmann
33161da121 [pixiv] add 'captions' option (#4327)
make extra requests for empty captions independent of 'sanity'
2024-10-22 16:31:37 +02:00
Mike Fährmann
66aa514c25 [scrolller] add initial support (#295, #3418, #5051) 2024-10-21 14:17:18 +02:00
Mike Fährmann
2e1dab3036 [pp] add 'error' event 2024-10-19 20:30:34 +02:00
Mike Fährmann
5d984f35aa [pinterest] support 'story' pins (#6188, #6078, #4229) 2024-10-19 17:47:31 +02:00
Mike Fährmann
0e4e40c9d2 [vk] document 'offset', add '--range' support 2024-10-17 21:20:21 +02:00
Mike Fährmann
02ca1ac602 [fanbox] add 'comments' option, extend 'metadata' option (#6287) 2024-10-06 22:31:41 +02:00
Mike Fährmann
8bcf7bf5ee [pixiv] add 'comments' option (#6287) 2024-10-06 20:41:36 +02:00
Mike Fährmann
b12d65ade2 [civitai] use tRPC API by default (#6279) 2024-10-06 08:57:58 +02:00
Mike Fährmann
c5be50fdaa [pixiv] implement workaround for 'limit_sanity_level' works
(#4327, #4747, #5054, #5435, #5651, #5655)

Metadata should be ~95% identical (there might be some 'date' differences)
and there could be issues with R-18 works, as these require some URL
manipulation to transform /c/250x250_80_a2/ thumbnail URLs into
/img-original/ ones.
2024-10-04 21:07:56 +02:00
Mike Fährmann
5b968a0a7c [boosty] extend image URLs with 'signedQuery' (#2387) 2024-10-03 20:25:12 +02:00
Mike Fährmann
321161c769 [patreon] use mobile UA (#6241, #6239, #6140) 2024-10-01 08:22:16 +02:00
Mike Fährmann
4e9dd036e7 [civitai] add 'nsfw' option (#3706) 2024-09-28 08:44:35 +02:00
Mike Fährmann
bc11dc0de2 [deviantart] add 'previews' option (#3782, #6124) 2024-09-27 10:41:26 +02:00
Mike Fährmann
f8f67dab22 [cookies] add 'cookies-select' option 2024-09-27 10:41:26 +02:00
Mike Fährmann
3eb3564b5d [civitai] support using internal tRPC API endpoints (#3706) 2024-09-25 18:46:18 +02:00
Mike Fährmann
a2db0d5c0d [civitai] add 'quality' option (#3706)
download 'original=true' files by default
2024-09-25 17:23:08 +02:00
Mike Fährmann
3348b05df0 [ao3] implement login with username & password (#6013) 2024-09-21 13:15:50 +02:00
Mike Fährmann
93eca64a73 [civitai] add initial support (#3706, #3787, #4129, #5995) 2024-09-20 17:21:17 +02:00
Mike Fährmann
73f833d08a [cookies:firefox] support using domain + container filters together 2024-09-16 14:58:44 +02:00
Mike Fährmann
638a676495 [ao3] add initial support (#6013) 2024-09-15 22:38:21 +02:00
Mike Fährmann
7d6520e15d [bluesky] support video downloads (#6183) 2024-09-15 22:38:03 +02:00
Mike Fährmann
df0d7d4a12 [cohost] add 'user' and 'post' extractors (#4483) 2024-09-11 18:03:33 +02:00
Mike Fährmann
ff07aef776 [pp:ugoira] implement storing "original" frames in archives (#6147)
… by using '"mode": "archive"'

- rename 'ffmpeg-demuxer' option to 'mode'
- add 'metadata' option
- add 'zip' as a possible `--ugoira` format

TODO: adjust file mtimes inside archives when 'mtime' is enabled
2024-09-09 21:41:37 +02:00
Mike Fährmann
07bd967f59 [pp:ugoira] update (#6056)
- introduce '_ugoira_frame_index' metadata field
- store Ugoira file exts separately
- add 'skip' option
2024-09-05 20:25:20 +02:00
Mike Fährmann
359572162b [pp:rename] improve renaming files 'to' a format (#5846, #6044) 2024-09-03 21:17:31 +02:00
Mike Fährmann
864484e4c6 [instagram] add 'info' as a possible 'include' value 2024-09-02 15:43:55 +02:00
Mike Fährmann
ae9b0da755 [pp:hash] add 'hash' post processor (#6099) 2024-08-31 17:04:44 +02:00
Mike Fährmann
bd932b6860 [twitter] add 'info' as a possible 'include' value (#6114) 2024-08-31 17:04:22 +02:00
Mike Fährmann
17f5ba43a8 [pp:rename] add 'rename' post processor (#5846, #6044)
renames previously downloaded files to a different filename format
2024-08-30 18:21:36 +02:00
Mike Fährmann
cf8e04d999 [koharu] improve format selection (#6088)
- allow specifying more than one possible format
- ignore not available formats
2024-08-29 09:33:24 +02:00
Mike Fährmann
b46169cfd2 add 'input-files' config option (#6059) 2024-08-27 17:21:49 +02:00
Mike Fährmann
4b286e80fd merge #6050: [wikimedia] add 'wiki' extractor 2024-08-25 09:38:24 +02:00
Mike Fährmann
46c3971c88 [bunkr] add 'tlds' option to allow URLs with all TLDs (#5875, #6017) 2024-08-24 20:45:44 +02:00
Mike Fährmann
4b94b7d477 [pp:metadata] add 'include' and 'exclude' options (#6058) 2024-08-19 21:58:57 +02:00