Commit Graph

5812 Commits

Author SHA1 Message Date
Mike Fährmann
5548cff730 [eporner] include 'www' subdomain in 'root' domain (#9064)
prevents redirect on initial request
2026-02-15 10:57:53 +01:00
Mike Fährmann
53cdfaac37 [common] add reference to 'exception' module to Extractor class
- remove 'exception' imports
- replace with 'self.exc'
2026-02-15 10:57:22 +01:00
Amar Paul
b552cdba04 [pholder] add support (#2568 #9067)
* feat: extractor for pholder.com
    Closes #2568
* feat[pholder]: support gallery_id properly and tags
* doc[text.nameext_from_name]: minor typo in docstring

* remove '__init__' & 'request' methods and 'json' import
* use 'text.nameext_from_url' to ensure a 'filename' value
* fix 'imgur' links by disabling auto-Referer
* fix 'data["id"].partition()' call
    'partition' returns 3 elements
* use 'item["_source"]' data directly
* remove unused supportedsites overwrite
* catch all exceptions in '_thumb_resolution'
    fixes "KeyError: 'width'"
* use 'author' name for user folders

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2026-02-15 09:46:18 +01:00
Mike Fährmann
01cb378baa [imagepond] support '/i/' URLs, update root domain 2026-02-14 19:33:52 +01:00
Mike Fährmann
41998cbb8f [chevereto] combine 'image' & 'video' extractors into a 'file' extractor 2026-02-14 19:33:52 +01:00
Mike Fährmann
1c8d07885c [chevereto] improve password requirement detection 2026-02-14 19:33:52 +01:00
Mike Fährmann
a33b9e5a60 [chevereto] extract 'title' metadata (#9061) 2026-02-14 19:33:52 +01:00
Mike Fährmann
d99c8c1320 [manganelo] fix 'manga' extractor (#9059) 2026-02-14 09:17:48 +01:00
Mike Fährmann
f1da162d72 [common] include duration in 'wait()' output 2026-02-13 20:44:46 +01:00
Mike Fährmann
d2477a94af [options] add 'sleep-retries' option 2026-02-13 18:04:05 +01:00
Mike Fährmann
34e402a01d [koofr] improve subdirectory handling - re-add 'num' & 'count' 2026-02-12 21:37:18 +01:00
Mike Fährmann
866e6df7a8 merge #9047: [foolfuuka] improve media link resolution 2026-02-12 19:40:39 +01:00
Mike Fährmann
df7642ed2f [foolfuuka] simplify
- filter posts manually
- don't use lists for 'in' checks against constant values
2026-02-12 19:31:29 +01:00
Mike Fährmann
84b5b1e5e4 [koofer] include '{hash}' in default filenames 2026-02-12 19:26:46 +01:00
Mike Fährmann
0f41f343f4 implement linear/exponential backoff for 'sleep-429' 2026-02-12 19:23:29 +01:00
Mike Fährmann
0fb5ce6bbd [xenforo] fix 'IndexError' when extracting attachments (#9046)
fixes regression introduced in d9917ec630
2026-02-12 14:32:22 +01:00
NecRaul
1744eb04cb [foolfuuka] early resolution for wsg/gif boards 2026-02-12 17:13:11 +04:00
NecRaul
f7f2584575 [foolfuuka] use filter to skip posts missing media 2026-02-12 16:12:08 +04:00
Mike Fährmann
12f5e24ab5 use sets for ' in { ... }' checks 2026-02-11 22:55:01 +01:00
Mike Fährmann
fc589e9ea4 [simpcity] fix 'gofile' links (#9042) 2026-02-11 21:55:43 +01:00
Mike Fährmann
04905ff7a2 [weebdex] fix 'chapter-reverse' (#9041)
fixes regression introduced in 56168fbc87
2026-02-11 09:15:56 +01:00
Mike Fährmann
448ec12b8b [tests/extractor] test 'extractor.find()' results 2026-02-10 20:54:54 +01:00
Mike Fährmann
102f8da294 [reddit] fix '/external-preview' embed downloads (#9037)
don't strip URL parameters
2026-02-10 20:45:51 +01:00
Mike Fährmann
f67b99a7b4 [reddit] fix "KeyError: 'children'" when expanding comments (#9037) 2026-02-10 18:23:15 +01:00
Mike Fährmann
d491564f8a [instagram] add 'user-strategy' option (#8978 #9025) 2026-02-10 16:46:33 +01:00
Mike Fährmann
a8376f2804 [instagram] add 'user-cache' option (#8978 #9025) 2026-02-10 12:01:37 +01:00
Mike Fährmann
ace8c50278 [imagefap] handle '/galleries?folderid=0' URLs (#9034) 2026-02-10 10:56:30 +01:00
Mike Fährmann
ce8d61df66 [imagefap] don't return anything for empty profiles (#9034) 2026-02-10 10:28:49 +01:00
Mike Fährmann
640d5f1621 [fikfap] improve URL patterns
use '[^/?#]+' for names
2026-02-10 07:56:39 +01:00
Mike Fährmann
52a5e39fc6 [reddit:user] fix user lookup when using sub view (#8228 #9032)
e.g. USER/submitted or USER/comments
fixes regression introduced in c16892a150
2026-02-09 18:57:00 +01:00
Mike Fährmann
b769dc76f4 [pornpics] fix 'search' extractor pagination (#9022)
make stop condition more lenient
2026-02-09 18:57:00 +01:00
wise-immersion
d77078d853 [fikfap] support main page post URLs (#9026)
* Update fikfap.py to allow for extracting a single post from the main page
    Current post extractor only works on links to posts
    on user pages but not on direct links to posts
* include 'singlepost' logic into existing 'post' extractor

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2026-02-09 18:54:33 +01:00
Mike Fährmann
3cf8813298 [instagram] fix errors for missing user profiles 2026-02-08 18:46:57 +01:00
Mike Fährmann
53747c63ba [instagram] fix 'avatar' & 'info' extractors (#8978)
export user lookup logic into 'user_by_screen_name' method
2026-02-08 15:54:20 +01:00
Mike Fährmann
d3c4328078 [imagefap:user] support multiple pages (#9016) 2026-02-08 11:49:11 +01:00
wise-immersion
a8636e75a1 [fikfap] add 'hashtag' extractor (#9018)
Added functionality to extract by hashtag and save to directory named after the hashtag.
2026-02-08 11:42:48 +01:00
wise-immersion
5d9b607158 [fikfap] allow for dash in usernames (#9019) 2026-02-08 11:07:00 +01:00
Mike Fährmann
8eafa1564a [reddit] try to improve comment metadata (#8228)
* provide toplevel 'date'
* preserve 'submission' data
2026-02-07 21:47:17 +01:00
Mike Fährmann
935bdb6229 [reddit:user] implement 'only' option (#8228) 2026-02-07 21:47:17 +01:00
Mike Fährmann
c16892a150 [reddit:user] provide 'user' metadata field (#8228) 2026-02-07 21:47:17 +01:00
Mike Fährmann
98ef34a9be [twitter] support 'article' media (#8995) 2026-02-07 21:47:17 +01:00
Mike Fährmann
7a98a93a8e [common] only call 'skip()' & 'finalize()' when defined 2026-02-07 21:47:17 +01:00
Mike Fährmann
40e4cc62c4 [common] pass job status to 'finalize()' 2026-02-07 21:47:17 +01:00
Mike Fährmann
da887721c9 [instagram] use '/topsearch/' to fetch user information (#8978) 2026-02-07 21:47:16 +01:00
Mike Fährmann
da2a6a8ffa [imhentai] use alternate strategy for galleries without image data (#8951) 2026-02-06 11:51:39 +01:00
Mike Fährmann
d3adfd603b [artstation] fix & update 'challenge' extractor 2026-02-05 22:37:10 +01:00
Mike Fährmann
04442e262e [artstation] download '/8k/' images (#9003) 2026-02-05 17:32:55 +01:00
Mike Fährmann
fdc59efdda [pixiv] fix errors when using metadata options for avatar/background
(#9002)
2026-02-05 12:07:42 +01:00
Mike Fährmann
2ac55f4870 [instagram] cache '/users/web_profile_info' results on disk (#8978)
In the rare case this endpoint returns results and not a 429 error,
store them locally so they can be re-used the next time this user
is downloaded from.
2026-02-05 11:21:15 +01:00
Mike Fährmann
09fbb3a594 [imagefap] use self.groups, remove __init__ 2026-02-05 09:04:55 +01:00