Commit Graph

6104 Commits

Author SHA1 Message Date
Mike Fährmann
e1aa4a7162 [kemonoparty] support new discord channel URLs (#6542) 2024-11-27 15:19:27 +01:00
Mike Fährmann
2162fa7df2 [kemonoparty] fix 'comments' for posts without comments (#6415)
https://github.com/mikf/gallery-dl/issues/6415#issuecomment-2501966303
2024-11-26 23:23:39 +01:00
Mike Fährmann
74d855c693 [kemonoparty] update to new site layout / API endpoints
(#6415, #6503, #6528, #6530, #6536)

… at least for the most part. Favorites are still broken, but the rest
should be functional again.
2024-11-26 22:15:28 +01:00
Mike Fährmann
5412b22dae [common] allow overriding more default 'User-Agent' headers (#6496)
ignore 'extractor.user-agent' if it is the default useragent value
and an extractor wants to set its own custom value
2024-11-26 21:50:28 +01:00
Mike Fährmann
94c3a4dca5 [scrolller] ignore posts without 'mediaSources' (#5051)
https://github.com/mikf/gallery-dl/issues/5051#issuecomment-2498729103
fixes "KeyError - 'mediaSources'"
2024-11-26 21:50:28 +01:00
Luca Russo
e9370b7b8a merge #5626: [facebook] add support (#470, #2612)
* [facebook] add initial support

* renamed extractors & subcategories

* better stability, modularity & naming

* added single photo extractor, warnings & retries

* more metadata + extract author followups

* renamed "album" mentions to "set" for consistency

* cookies are now only used when necessary

also added author followups for singular images

* removed f-strings

* added way to continue extraction from where it left off

also fixed some bugs

* fixed bug wrong subcategory

* added individual video extraction

* extract audio + added ytdl option

* updated setextract regex

* added option to disable start warning

the extractor should be ready :)

* fixed description metadata bug

* removed cookie "safeguard" + fixed for private profiles

I have removed the cookie "safeguard" (not using cookies until they are necessary) as I've come to the conclusion that it does more harm than good. There is no way to detect whether the extractor has skipped private images, that could have been possibly extracted otherwise. Also, doing this provides little to no advantages.

* fixed a few bugs regarding profile parsing

* a few bugfixes

Fixed some metadata attributes from not decoding correctly from non-latin languages, or not showing at all.
Also improved few patterns.

* retrigger checks

* Final cleanups

-Added tests
-Fixed video extractor giving incorrect URLs
-Removed start warning
-Listed supported site correctly

* fixed regex

* trigger checks

* fixed livestream playback extraction + bugfixes

I've chosen to remove the "reactions", "comments" and "views" attributes as I've felt that they require additional maintenance even though nobody would ever actually use them to order files.
I've also removed the "title" and "caption" video attributes for their inconsistency across different videos.
Feel free to share your thoughts.

* fixed regex

* fixed filename fallback

* fixed retrying when a photo url is not found

* fixed end line

* post url fix + better naming

* fix posts

* fixed tests

* added profile.php url

* made most of the requested changes

* flake

* archive: false

* removed unnecessary url extract

* [facebook] update

- more 'Sec-Fetch-…' headers
- simplify 'text.nameext_from_url()' calls
- replace 'sorted(…)[-1]' with 'max(…)'
- fix '_interval_429' usage
- use replacement fields in logging messages

* [facebook] update URL patterns

get rid of '.*' and '.*?'

* added few remaining tests

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2024-11-26 21:49:11 +01:00
Mike Fährmann
d1ad97ae0c [motherless] add to 'modules' list 2024-11-22 21:18:13 +01:00
Mike Fährmann
b78c35fd15 [motherless] add 'media' and 'gallery' extractors
(#2074, #4413, #6221)
2024-11-22 21:06:32 +01:00
Mike Fährmann
b8b943fc38 [pinterest] update API headers (#6513)
'BoardFeed' requests fail without 'X-Pinterest-PWS-Handler'
2024-11-22 08:41:10 +01:00
Mike Fährmann
ededba8087 [steamgriddb] disable 'adjust-extensions' for fake-png files (#5274) 2024-11-20 19:19:23 +01:00
Mike Fährmann
74a8587798 merge #6493: [docs] update pip command for installing dev version
- recommend '--force-reinstall'
- remove '-I' and '--no-cache-dir'
2024-11-20 16:22:01 +01:00
fireattack
7d609b883e Remove -I for installing from the source in readme
- Update according to the comment from @mikf
- Follow rst syntax
- remove '--no-cache-dir'
2024-11-20 16:19:38 +01:00
Mike Fährmann
f47c0982a0 [bluesky] improve 'web' did handling 2024-11-20 16:05:05 +01:00
Mike Fährmann
88ba85d285 [blogger] use default API key when 'api-key' is empty
… and not just when 'api-key' is not set.
2024-11-20 16:02:16 +01:00
Mike Fährmann
9b2d782cb7 [pp:classify] rewrite & simplify (#5213)
Do not manually build paths, which get later overwritten by
pathfmt.build_path() anyway. Just update the target directory and let
the rest of the "path logic" handle it.

Fixes skipping previously downloaded and categorized files,
which was broken since 8124c16a50
2024-11-19 08:05:11 +01:00
Mike Fährmann
00fe1c81b2 [ytdl] fix AttributeError caused by 'decodeOption()' removal
in yt-dlp 2024.11.18
2024-11-18 16:22:08 +01:00
Mike Fährmann
b069783578 [newgrounds] fix metadata extraction (#6463)
- fix 'comment' metadata
- fix 'following' extractor pattern
- use own 'type' values, since 'og:type' is no longer available
- update test results
2024-11-18 16:21:59 +01:00
Mike Fährmann
50acf2ac84 [danbooru] add 'artist-search' extractor (#5348) 2024-11-17 16:58:54 +01:00
Mike Fährmann
ca4b2a0760 [danbooru] add 'artist' extractor (#5348) 2024-11-17 16:23:16 +01:00
Mike Fährmann
9184a5643a [danbooru] move all initialization code into '_init()' 2024-11-17 16:15:53 +01:00
Mike Fährmann
55afd712d6 [pp] allow inheriting settings from global 'postprocessor' entries
No idea how to properly explain/document this, so here's an example:

The extractor.postprocessors object
gets its options from postprocessor.jl
and adds 'filename' itself.

{
    "extractor": {
        "postprocessors": {
            "type": "jl",
            "filename": "meta.jsonl"
        }
    },

    "postprocessor": {
        "jl": {
            "name": "metadata",
            "mode": "jsonl",
            "open": "a"
        }
    }
}
2024-11-16 21:16:13 +01:00
Mike Fährmann
80454460ce [config] support accumulating non-list values
fixes 1264fc518b
2024-11-16 21:13:57 +01:00
Mike Fährmann
bced143750 [tests] add workaround for compile_expression_defaultdict in pypy3 2024-11-16 19:35:28 +01:00
Mike Fährmann
f0419574a5 merge #6475: [imagechest] fix extractors 2024-11-16 09:26:41 +01:00
Mike Fährmann
75612997fe [imagechest] simplify
and fix user pagination end condition
2024-11-16 09:17:13 +01:00
Mike Fährmann
f7246f025f [weibo] simplify 'livephoto' extraction (#6471)
continuation of 396b52aef7

fixes wrong 'filename' and 'extension' values
when 'ssig' query parameter contains "%2F"
2024-11-16 08:19:02 +01:00
Mike Fährmann
cb09273670 [koharu] implement 'tags' option 2024-11-15 23:49:58 +01:00
Mike Fährmann
ddd325b435 merge #6432: [koharu] update domain (#6430) 2024-11-15 22:41:46 +01:00
Mike Fährmann
e5c2882320 [koharu] cleanup
- update BASE_PATTERN formatting
- fix groups indices
- add tests for new domains
- update docs/supportedsites
2024-11-15 22:41:40 +01:00
K0ng2
a09d9edaa6 [koharu] updat root and root_api change 2024-11-15 22:14:33 +01:00
Mike Fährmann
0d1469f229 [exhentai] implement 'tags' option (#2117)
allow splitting tags into categories,
e.g. 'tags_parody', 'tags_group', etc.
2024-11-15 21:47:13 +01:00
Mike Fährmann
1264fc518b allow 'postprocessors' to be a single dict/str
do not require it to be a list with just one element

"postprocessors": "metadata"
"postprocessors": {"name": "metadata"}
2024-11-15 21:15:00 +01:00
Mike Fährmann
c82f3db098 [common] add 'proxy-env' option
(#6134, #6455)
disable using environment proxies by default
2024-11-15 18:03:56 +01:00
Mike Fährmann
0a72a5009c [common] disable Authorization header injection from .netrc auth
(#6134, #6455)
2024-11-15 17:37:04 +01:00
Mike Fährmann
a3dbc58172 [pillowfort] provide 'count' metadata field (#6478) 2024-11-15 08:27:52 +01:00
Mike Fährmann
9821503226 [misc] 'api_root' -> 'root_api' 2024-11-14 23:44:15 +01:00
Mike Fährmann
e763efd36c [bilibili] add workarounds for getting rate-limited (#6443)
- set 3-6 second request_interval by default
- retry request after waiting 5 minutes
2024-11-14 23:06:26 +01:00
Mike Fährmann
5bc3657c59 [util] implement 'compile_filter()' (#5262)
https://github.com/mikf/gallery-dl/issues/5262#issuecomment-2477029728

allow (theoretically*) all filter expression statements
to be a list of individual filters

(*) except for 'filename' and 'directory' conditionals,
as dict keys cannot be lists
2024-11-14 22:47:36 +01:00
Mike Fährmann
0b99d9e6b9 [util] add "defaultdict" filters-environment
allows accessing undefined values without raising an exception,
but preserves other errors like TypeError, AttributeError, etc
2024-11-14 22:47:25 +01:00
Mike Fährmann
cfe24a9e31 [twitter] make 'source' metadata extraction non-fatal (#6472) 2024-11-14 18:59:01 +01:00
Mike Fährmann
396b52aef7 [weibo] fix livephoto 'filename' & 'extension' (#6471) 2024-11-14 18:56:18 +01:00
Achim
917e873c63 fix imagechest extractor 2024-11-14 16:54:59 +01:00
Achim
b2fa149598 fix imagechest extractor 2024-11-14 16:50:06 +01:00
Mike Fährmann
a3276e3b5d [hentaifoundry] add 'tag' extractor (#6465) 2024-11-13 20:56:37 +01:00
Mike Fährmann
b62c466c14 [flickr] fix video download URLs (#6464)
continuation of 0e18fa395d
fix video detection in '_file_url'
2024-11-13 20:56:37 +01:00
Mike Fährmann
cd6d6ea8be [options] fix passing negative numbers as arguments (#5262)
https://github.com/mikf/gallery-dl/issues/5262#issuecomment-2468677453

fixes regression introduced in 9e729681

'argparse' sets a flag and changes its behavior when using something
that looks like a negative number as option string, '-4' and '-6' in
this case.
2024-11-11 19:07:37 +01:00
Mike Fährmann
30ff775382 [workflows:tests] change job name to 'test'
… and use alternate list syntax for Python versions
2024-11-10 21:29:48 +01:00
Mike Fährmann
2b96d638dc [bunkr] support 'bunkr.cr' URLs 2024-11-10 20:43:33 +01:00
Mike Fährmann
096b9f1d26 [bunkr] fix album names containing <>&
unescaping HTML entities once is not good enough
2024-11-10 20:38:21 +01:00
Mike Fährmann
c61c0461a9 [urlgalleries] fix 'root' and update 'request_interval' 2024-11-10 20:28:55 +01:00