Commit Graph

374 Commits

Author SHA1 Message Date
Mike Fährmann
1d75c8308c [weebcentral] add support (#6778) 2025-01-10 23:04:51 +01:00
Mike Fährmann
63008f77e2 merge #6607: [lofter] add initial support
(#650, #2294, #4095, #4728, #5656)
2024-12-11 20:41:52 +01:00
Mike Fährmann
86334f9c4a [yiffverse] add support (#6611) 2024-12-11 10:57:21 +01:00
hdk5
0466fcab4c [lofter]: add initial support 2024-12-08 19:37:42 +02:00
Mike Fährmann
ef7ff31117 [realbooru] fix extraction (#6543)
- extract data from HTML pages since API is no longer usable
- move code into its own separate 'realbooru' module
2024-12-07 17:39:25 +01:00
Luca Russo
e9370b7b8a merge #5626: [facebook] add support (#470, #2612)
* [facebook] add initial support

* renamed extractors & subcategories

* better stability, modularity & naming

* added single photo extractor, warnings & retries

* more metadata + extract author followups

* renamed "album" mentions to "set" for consistency

* cookies are now only used when necessary

also added author followups for singular images

* removed f-strings

* added way to continue extraction from where it left off

also fixed some bugs

* fixed bug wrong subcategory

* added individual video extraction

* extract audio + added ytdl option

* updated setextract regex

* added option to disable start warning

the extractor should be ready :)

* fixed description metadata bug

* removed cookie "safeguard" + fixed for private profiles

I have removed the cookie "safeguard" (not using cookies until they are necessary) as I've come to the conclusion that it does more harm than good. There is no way to detect whether the extractor has skipped private images, that could have been possibly extracted otherwise. Also, doing this provides little to no advantages.

* fixed a few bugs regarding profile parsing

* a few bugfixes

Fixed some metadata attributes from not decoding correctly from non-latin languages, or not showing at all.
Also improved few patterns.

* retrigger checks

* Final cleanups

-Added tests
-Fixed video extractor giving incorrect URLs
-Removed start warning
-Listed supported site correctly

* fixed regex

* trigger checks

* fixed livestream playback extraction + bugfixes

I've chosen to remove the "reactions", "comments" and "views" attributes as I've felt that they require additional maintenance even though nobody would ever actually use them to order files.
I've also removed the "title" and "caption" video attributes for their inconsistency across different videos.
Feel free to share your thoughts.

* fixed regex

* fixed filename fallback

* fixed retrying when a photo url is not found

* fixed end line

* post url fix + better naming

* fix posts

* fixed tests

* added profile.php url

* made most of the requested changes

* flake

* archive: false

* removed unnecessary url extract

* [facebook] update

- more 'Sec-Fetch-…' headers
- simplify 'text.nameext_from_url()' calls
- replace 'sorted(…)[-1]' with 'max(…)'
- fix '_interval_429' usage
- use replacement fields in logging messages

* [facebook] update URL patterns

get rid of '.*' and '.*?'

* added few remaining tests

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2024-11-26 21:49:11 +01:00
Mike Fährmann
d1ad97ae0c [motherless] add to 'modules' list 2024-11-22 21:18:13 +01:00
hdk5
6eef3e3495 [bilibili] initial support (#2824) 2024-11-10 00:21:27 +02:00
Mike Fährmann
cb0d8cae77 merge #6227: [everia] add support (#1067, #2472, #4091) 2024-11-03 17:52:17 +01:00
missionfloyd
d31a3b5da3 [everia.club] Add support
- Unescape title and URL
- Add tags and categories metadata
    Lookup tag id with API instead of downloading tag page
- Add category extractor
- Add tests
- Rename EveriaExtractor to EveriaPostExtractor
- Fix EveriaPostExtractor example
- Lookup tags/categories by post id
- Add date extractor
- Remove leftover pages parameter
- Add error handling for invalid dates.
- Add filename numbering
    Parse date
- Rename extract() to images()
- Remove html import
- Fix search/date URLs with page number
- Fix tag/category search
- Fix post extractor
- Fix tag, category extractors
- Fix search extractor
- Only load first page once
- Fix date extractor
- Fix tests
- Clean up search extractor
2024-11-03 14:09:07 +01:00
Mike Fährmann
d787c0c4ea [rule34xyz] add support (#1078, #4960) 2024-11-03 10:12:26 +01:00
Mike Fährmann
655e42dc92 merge #6240: [rule34vault] add support (#5708) 2024-10-28 22:31:05 +01:00
ssdaniel24
3d0263b3ab [rule34vault] Added initial support for rule34vault.com
- Added playlists support for rule34vault
- Added support for posts in rule34vault
- Fixed supported sites with script
- Fixed posts pattern in rule34vault
- Added tests for rule34vault
- Clean
- Fixed lint warnings
2024-10-28 22:26:47 +01:00
Mike Fährmann
5de8576ff6 [noop] add 'noop' extractor 2024-10-28 19:45:24 +01:00
Mike Fährmann
10c076e7f2 [saint] add 'album' and 'media' extractors (#4405, #6324) 2024-10-27 22:27:30 +01:00
Mike Fährmann
66aa514c25 [scrolller] add initial support (#295, #3418, #5051) 2024-10-21 14:17:18 +02:00
Mike Fährmann
4a1cbe94a9 [pururin] remove module
"This domain name has been seized in accordance with a seizure warrant
 issued by the United States District Court for the District of Idaho"
2024-10-10 15:57:17 +02:00
Mike Fährmann
1ad58cab84 [boosty] add initial support (#2387) 2024-10-02 20:39:55 +02:00
Mike Fährmann
93eca64a73 [civitai] add initial support (#3706, #3787, #4129, #5995) 2024-09-20 17:21:17 +02:00
Mike Fährmann
638a676495 [ao3] add initial support (#6013) 2024-09-15 22:38:21 +02:00
Mike Fährmann
df0d7d4a12 [cohost] add 'user' and 'post' extractors (#4483) 2024-09-11 18:03:33 +02:00
Mike Fährmann
399ba85841 [fallenangels] remove module 2024-07-30 17:33:16 +02:00
Mike Fährmann
aa6d00613f [cien] initial support (#2885, #4103, #5240) 2024-07-28 19:27:12 +02:00
Mike Fährmann
c9aeedeafd [koharu] add 'gallery' and 'search' extractors (#5893, #4707) 2024-07-28 12:22:18 +02:00
Mike Fährmann
226ead728e [agnph] add 'tag' and 'post' extractors (#5284, #5890) 2024-07-27 12:17:47 +02:00
Mike Fährmann
8fce9ea6d5 [hentainexus] restore module (#5275)
revert 97641cd151
2024-06-05 16:48:25 +02:00
Mike Fährmann
ce228ee163 [photobucket] remove module
had been broken for years and the new site is payed access only
2024-06-02 01:40:31 +02:00
Mike Fährmann
8a11b72253 remove extractor/test.py (#4504) 2024-02-27 01:37:57 +01:00
Mike Fährmann
cf7d6be2d4 [bluesky] initial support (#4438, #4708, #4722, #5047) 2024-02-07 19:09:33 +01:00
Mike Fährmann
6f8592eaff [hbrowse] remove from modules list 2024-01-20 18:25:38 +01:00
Ailothaen
e33056adcd [wikimedia] Add Wikipedia/Wikimedia extractor 2024-01-16 02:32:25 +01:00
hunter-gatherer8
6c4abc982e [2ch] add 'thread' and 'board' extractors
- [2ch] add thread extractor
- [2ch] add board extractor
- [2ch] add new entry to supported sites
2024-01-15 03:51:03 +01:00
Mike Fährmann
355b909f46 merge #5041: [steamgriddb] add support (#5033) 2024-01-13 00:59:15 +01:00
blankie
2ccb7d3bd3 [steamgriddb] add support 2024-01-09 17:12:56 +11:00
blankie
61f3b2f820 [hatenablog] add support 2024-01-09 01:29:47 +11:00
Mike Fährmann
a441249ea2 merge #4979: [batoto] add 'chapter' and 'manga' extractors (#1434, #2111) 2024-01-06 01:53:26 +01:00
Mike Fährmann
b11c352d66 [bato] rename to 'batoto'
to use the same category name as the previous bato.to site
2024-01-06 01:49:34 +01:00
Mike Fährmann
11150a7d72 [nudecollect] remove module 2024-01-05 21:32:04 +01:00
enduser420
0f30136109 [zzup] add 'gallery' extractor 2024-01-04 21:38:59 +05:30
Antonio
e348da7a06 [poringa] add support 2023-12-27 00:07:23 -06:00
bug-assassin
74c225f94e [bato] add support 2023-12-26 22:33:33 -05:00
blankie
fbe14a2745 [postmill] add support 2023-12-12 21:36:52 +11:00
jsouthgb
1770c31e63 [urlgalleries] add support 2023-12-05 07:07:06 -05:00
Mike Fährmann
e1404827a6 [pixeldrain] add 'file' and 'album' extractors (#4839) 2023-11-22 19:01:19 +01:00
jsouthgb
286d0cb098 [tmohentai] add support 2023-11-17 19:34:34 -05:00
enduser420
c0714d5585 [4archive] add 'thread' and 'board' extractors 2023-10-24 23:05:28 +05:30
Mike Fährmann
2911ed1240 [chevereto] add generic extractors (#4664)
- support jpgfish
- support pixl.li / pixl.is (#3179, #4357)
2023-10-16 14:15:39 +02:00
Mike Fährmann
f2de70f254 [gfycat] remove module 2023-09-04 18:27:11 +02:00
Mike Fährmann
8dceea3384 [shimme2] move 'giantessbooru' back into shimmie module (#4373)
Do the same thing as for 'realbooru' and override 'posts()'
insteadd of using a separate module.
2023-08-18 15:25:28 +02:00
Mike Fährmann
391a7d74c8 [giantessbooru] fix and move to separate module (#4373)
too many differences to the other shimmie2 sites
2023-08-09 18:36:56 +02:00