Commit Graph

4636 Commits

Author SHA1 Message Date
Mike Fährmann
f0419574a5 merge #6475: [imagechest] fix extractors 2024-11-16 09:26:41 +01:00
Mike Fährmann
75612997fe [imagechest] simplify
and fix user pagination end condition
2024-11-16 09:17:13 +01:00
Mike Fährmann
f7246f025f [weibo] simplify 'livephoto' extraction (#6471)
continuation of 396b52aef7

fixes wrong 'filename' and 'extension' values
when 'ssig' query parameter contains "%2F"
2024-11-16 08:19:02 +01:00
Mike Fährmann
cb09273670 [koharu] implement 'tags' option 2024-11-15 23:49:58 +01:00
Mike Fährmann
ddd325b435 merge #6432: [koharu] update domain (#6430) 2024-11-15 22:41:46 +01:00
Mike Fährmann
e5c2882320 [koharu] cleanup
- update BASE_PATTERN formatting
- fix groups indices
- add tests for new domains
- update docs/supportedsites
2024-11-15 22:41:40 +01:00
K0ng2
a09d9edaa6 [koharu] updat root and root_api change 2024-11-15 22:14:33 +01:00
Mike Fährmann
0d1469f229 [exhentai] implement 'tags' option (#2117)
allow splitting tags into categories,
e.g. 'tags_parody', 'tags_group', etc.
2024-11-15 21:47:13 +01:00
Mike Fährmann
c82f3db098 [common] add 'proxy-env' option
(#6134, #6455)
disable using environment proxies by default
2024-11-15 18:03:56 +01:00
Mike Fährmann
0a72a5009c [common] disable Authorization header injection from .netrc auth
(#6134, #6455)
2024-11-15 17:37:04 +01:00
Mike Fährmann
a3dbc58172 [pillowfort] provide 'count' metadata field (#6478) 2024-11-15 08:27:52 +01:00
Mike Fährmann
9821503226 [misc] 'api_root' -> 'root_api' 2024-11-14 23:44:15 +01:00
Mike Fährmann
e763efd36c [bilibili] add workarounds for getting rate-limited (#6443)
- set 3-6 second request_interval by default
- retry request after waiting 5 minutes
2024-11-14 23:06:26 +01:00
Mike Fährmann
cfe24a9e31 [twitter] make 'source' metadata extraction non-fatal (#6472) 2024-11-14 18:59:01 +01:00
Mike Fährmann
396b52aef7 [weibo] fix livephoto 'filename' & 'extension' (#6471) 2024-11-14 18:56:18 +01:00
Achim
917e873c63 fix imagechest extractor 2024-11-14 16:54:59 +01:00
Achim
b2fa149598 fix imagechest extractor 2024-11-14 16:50:06 +01:00
Mike Fährmann
a3276e3b5d [hentaifoundry] add 'tag' extractor (#6465) 2024-11-13 20:56:37 +01:00
Mike Fährmann
b62c466c14 [flickr] fix video download URLs (#6464)
continuation of 0e18fa395d
fix video detection in '_file_url'
2024-11-13 20:56:37 +01:00
Mike Fährmann
2b96d638dc [bunkr] support 'bunkr.cr' URLs 2024-11-10 20:43:33 +01:00
Mike Fährmann
096b9f1d26 [bunkr] fix album names containing <>&
unescaping HTML entities once is not good enough
2024-11-10 20:38:21 +01:00
Mike Fährmann
c61c0461a9 [urlgalleries] fix 'root' and update 'request_interval' 2024-11-10 20:28:55 +01:00
Mike Fährmann
73d6e56a8f merge #6443: [bilibili] add support for articles (#2824) 2024-11-10 18:01:51 +01:00
Mike Fährmann
82d561e825 [bilibili] update
- use self.groups[…] to access matched values
- extract more metadata (count, width, height, size)
- remove type hint
- add tests
- update docs/supportedsites
2024-11-10 17:59:24 +01:00
hdk5
fc59e0fb14 [bilibili] support large articles 2024-11-10 15:18:03 +02:00
Mike Fährmann
74f1e9a1ac [poipiku] return 'count' as proper number (#6445) 2024-11-10 08:26:43 +01:00
hdk5
6eef3e3495 [bilibili] initial support (#2824) 2024-11-10 00:21:27 +02:00
Mike Fährmann
7916c8bf77 allow passing cookies to OAuth extractors
partially revert ce54b8c04c
2024-11-09 18:06:27 +01:00
Mike Fährmann
0e18fa395d [flickr] use "download" URLs (#6360) 2024-11-09 17:33:27 +01:00
Mike Fährmann
1ddbcda58b [nhentai] support ',webp' files (#6442) 2024-11-08 17:46:38 +01:00
Mike Fährmann
b6cf348658 [webtoons] extract 'episode_no' for comic results (#6439) 2024-11-08 14:19:17 +01:00
Mike Fährmann
77f761d320 merge #6437: [philomena:ponybooru] switch default filter
… to get everything by default
2024-11-08 08:20:10 +01:00
Mike Fährmann
6205e255f4 merge #6394: [tumblr] add 'search' extractor 2024-11-08 08:17:46 +01:00
Mike Fährmann
33778d35ba [tumblr] update
- simplify
- fix search pagination
- support custom search mode and post types
2024-11-08 08:15:13 +01:00
Shelvacu
f8e707b92c [philomena] switch default ponybooru filter to get everything by default
The system filter mislabeled "Everything" hides 4 tags https://ponybooru.org/filters/2

There are [many public filters that don't hide anything](https://ponybooru.org/filters?fq=spoilered_count%3A0%2C+hidden_count%3A0), I just picked [the oldest one](https://ponybooru.org/filters/3).
2024-11-07 20:08:42 -08:00
Mike Fährmann
ce90566c56 [pinterest] detect video/audio by block content (#6421)
story blocks from search/board results do not always contain a 'type'
2024-11-05 15:55:24 +01:00
Mike Fährmann
a9a9f3a180 [pinterest] support 'story_pin_music_block' blocks (#6421) 2024-11-05 15:55:24 +01:00
Mike Fährmann
0b3ddd01af [hiperdex] update domain to 'hipertoon.com' (#6420)
and fix 'description' extraction
2024-11-05 15:54:42 +01:00
Mike Fährmann
9afbe91f82 [rule34xyz] add 'format' option (#1078) 2024-11-05 15:45:52 +01:00
Mike Fährmann
51b16d078b [rule34xyz] ensure 'files' keys are strings (#1078)
fixes -K/--list-keywords
2024-11-05 09:34:17 +01:00
Mike Fährmann
390b8ddd3e [common] emit logging messages for --write-pages files 2024-11-03 20:38:33 +01:00
Mike Fährmann
cb0d8cae77 merge #6227: [everia] add support (#1067, #2472, #4091) 2024-11-03 17:52:17 +01:00
Mike Fährmann
cea062ffc5 [everia] update
- implement general _pagination method
- simplify code
- adjust URL patterns
- update test results
2024-11-03 17:51:04 +01:00
missionfloyd
d31a3b5da3 [everia.club] Add support
- Unescape title and URL
- Add tags and categories metadata
    Lookup tag id with API instead of downloading tag page
- Add category extractor
- Add tests
- Rename EveriaExtractor to EveriaPostExtractor
- Fix EveriaPostExtractor example
- Lookup tags/categories by post id
- Add date extractor
- Remove leftover pages parameter
- Add error handling for invalid dates.
- Add filename numbering
    Parse date
- Rename extract() to images()
- Remove html import
- Fix search/date URLs with page number
- Fix tag/category search
- Fix post extractor
- Fix tag, category extractors
- Fix search extractor
- Only load first page once
- Fix date extractor
- Fix tests
- Clean up search extractor
2024-11-03 14:09:07 +01:00
Mike Fährmann
9b59af8d8d [instagram] fix using numeric cursor values (#6414) 2024-11-03 12:03:01 +01:00
Mike Fährmann
d787c0c4ea [rule34xyz] add support (#1078, #4960) 2024-11-03 10:12:26 +01:00
Mike Fährmann
7c0d2ca07d [rule34vault] update
- implement 'tags' categorization
- don't use 'totalCount' for pagination end
- update tests
2024-11-03 09:59:25 +01:00
Mike Fährmann
d5fa1d6aba [sankaku] improve tag categorization code
translate tag type ID to name for each category
instead of for each tag
2024-11-03 09:21:39 +01:00
Delphox
565dc5b43b [bluesky] match fxbsky.app and vxbsky.app 2024-11-02 16:00:43 -03:00
Mike Fährmann
93adfbe935 merge #6410: [bluesky] match common bluesky embed fixes 2024-11-02 18:28:07 +01:00