Commit Graph

358 Commits

Author SHA1 Message Date
Mike Fährmann
79bc82884c [pornpics] add 'gallery' extractor (#263, #3544, #3654) 2023-02-17 15:00:57 +01:00
Mike Fährmann
925b467496 split e621 from danbooru module (#3425) 2023-02-03 19:24:31 +01:00
Mike Fährmann
c2bc70593e implement ability to load external extractor classes
- -X/--extractors
- extractor.module-sources
2023-01-30 23:10:10 +01:00
Mike Fährmann
13a90969c7 merge #3575: [nudecollect] add 'image' and 'album' extractors 2023-01-28 16:04:47 +01:00
Mike Fährmann
abc3619feb [lexica] add 'search' extractor (#3567) 2023-01-28 16:00:32 +01:00
enduser420
2a5903dc16 [nudecollect] add 'image' and 'album' extractors 2023-01-26 17:25:33 +05:30
enduser420
5cb263fdd2 [wikifeet/wikifeetx] add 'gallery' extractor 2023-01-16 21:08:45 +05:30
enduser420
e8541a131d [tcbscans] add 'chapter' and 'manga' extractors 2023-01-06 16:16:31 +05:30
enduser420
5a740ef78b [fanleaks] add 'post' and 'model' extractors 2022-12-30 19:24:05 +05:30
lx30011
895b41f1ac [jschan] add generic jschan extractor 2022-12-23 00:32:52 +01:00
enduser420
e5076ba056 [fapello] add 'post', 'user' and 'path' extractors 2022-12-16 16:53:32 +05:30
Mike Fährmann
1317625ec4 [webmshare] add 'video' extractor (#2410) 2022-12-14 19:59:07 +01:00
enduser420
41bf236d36 [lynxchan] add generic extractors for lynxchan imageboards (#3394)
* [lynxchan] add generic extractors for lynxchan imageboards

includes kohlchan.net, endchan.org:wq

* [lynxchan] set pop default to empty tuple

* Apply suggestions from code review

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2022-12-12 13:13:39 +01:00
Mike Fährmann
eb94568e1f [soundgasm] add 'audio' extractor (#3384) 2022-12-09 23:19:07 +01:00
Mike Fährmann
86f0597c95 [kissgoddess] remove module
site does not host albums anymore
2022-12-05 19:28:50 +01:00
enduser420
213676c785 [fapachi] add 'post' and 'user' extractors (#3347)
* [fapachi] add 'post' and 'user' extractors

* [fapachi] add 'keyword' to test

* [fapachi] remove whitespaces
2022-12-02 13:54:29 +01:00
Mike Fährmann
6c153750fa [nitter] add extractors for Nitter instances (#2696) 2022-11-15 11:44:16 +01:00
enduser420
039d06c8f6 [mangaread] add 'chapter' and 'manga' extractors 2022-11-13 16:00:34 +05:30
Mike Fährmann
ccb80f1b8b [uploadir] add support for 'uploadir.com' (#3162) 2022-11-05 14:25:09 +01:00
Mike Fährmann
a7d23f1484 [vichan] add generic extractors for vichan imageboards
includes 8kun.top, smuglo.li, and wikieat.club
2022-10-21 14:40:45 +02:00
enduser420
0163ca86f7 [smugloli] add smugloli extractors (#3060) 2022-10-19 11:25:18 +02:00
Mike Fährmann
618c81afdf [ngomik] remove module
"Access denied"
2022-10-19 10:47:25 +02:00
Mike Fährmann
1696f68a68 [8chan] add 'thread' and 'board' extractors (#2938) 2022-10-11 10:47:19 +02:00
enduser420
f0321f423d [2chen] Add 2chen.moe extractor (#2707)
* [2chen] Add 2chen.moe extractor

* change "==" to is

* fix for "test_unique_pattern_matches"

* fix regex pattern and group matching

* fix regex again

* [2chen] add 'reply_no' and 'hash' metadata and change 'filename_fmt'

also made an entry in supportedsites.md

* [2chen] unescape 'title'

* [2chen] partition() -> rpartition()

* [2chen] extract 'date' and 'name' metadata

* [2chen] remove 'offset' argument

* [2chen] do some changes

* [2chen] do some more changes

* [2chen] unescape 'name' and 'filename'
2022-10-04 22:18:13 +02:00
enduser420
f7ba19a1c0 [nana] add 'nana' extractors (#2967) 2022-10-04 09:23:24 +02:00
enduser420
bd846abba0 [hotleak] add hotleak extractor (#2909) (#2890) 2022-09-18 13:37:16 +02:00
Mike Fährmann
a799fae2df [catbox] add 'album' extractor (#2410)
adapted from https://github.com/mikf/gallery-dl/pull/2805

- rewrite using GalleryExtractor
- extract more metadata
- match lolisafe names
- add test
2022-08-18 18:00:24 +02:00
enduser420
26a176e68c Merge branch 'master' into jpgchurch-extractor 2022-08-01 09:10:38 +05:30
enduser420
5256f6c9f4 [jpgchurch] . 2022-07-31 20:36:19 +05:30
enduser420
7bbaf025c0 [jpgchurch] refactor 2022-07-31 20:28:40 +05:30
Mike Fährmann
3a8addfe45 [zerochan] add 'tag' and 'image' extractors (#1434) 2022-07-27 22:58:23 +02:00
Mike Fährmann
46f11a3118 [bunkr] fix extraction (#2732)
move bunkr.is code to its own module
2022-07-15 13:00:57 +02:00
enduser420
01bbce691f remove unrelated changes 2022-06-30 20:16:10 +05:30
enduser420
a9b8a2430d [Jpgchurch] Add Jpgchurch extractor 2022-06-30 19:57:44 +05:30
enduser420
758edc9292 [2chen] Add 2chen.moe extractor 2022-06-27 17:46:28 +05:30
Mike Fährmann
27e8078fb7 [poipiku] add 'user' and 'post' extractors (#1602) 2022-06-20 11:32:02 +02:00
Mike Fährmann
fa902cd54d [itaku] add 'gallery' and 'image' extractors (#1842) 2022-06-20 11:31:44 +02:00
loragja
7e545a3ae9 [gofile] add gofile.io extractor (#2364)
* Add gofile extractor

* add gofile extractor to module list

* add support for tiny monitors and ancient python versions

* seriously, f-strings are not *that* new...

* i love flake8 :)

* add 'api-token' and 'recursive' options
* add tests
2022-03-29 17:31:57 +02:00
Layerex
625f4d4cc4 [telegraph] Add telegra.ph extractor (#2312) 2022-03-28 19:18:13 +02:00
Mike Fährmann
5a50569360 [toyhouse] support 'art' listings (#1546, #2331) 2022-02-27 16:22:50 +01:00
Mike Fährmann
fdfdc1b614 [kissgoddess] add 'gallery' and 'model' extractors
(closes #1052, #2304)
2022-02-20 04:45:37 +01:00
Mike Fährmann
79a461a2c1 [mememuseum] add 'tag' and 'post' extractors (closes #2264) 2022-02-20 02:15:38 +01:00
Mike Fährmann
254a5b26e0 [twibooru] add extractors for searches, galleries, and posts
(#2219)
2022-02-18 23:43:57 +01:00
David Hoppenbrouwers
b17e2dcf93 [wallpapercave] add extractor for images (#2205) 2022-02-11 23:44:51 +01:00
Thomas Jost
a7de819aca [lightroom] add Lightroom gallery extractor (#2263) 2022-02-11 21:30:59 +01:00
Mike Fährmann
563bd0ecf4 [danbooru] inherit from BaseExtractor
- merge danbooru and e621 code
- support booru.allthefallen.moe (closes #2283)
- remove support for old e621 tag search URLs
2022-02-11 21:01:51 +01:00
enormous-muscles
55326377d8 Add Kohlchan extractor (#2251) 2022-02-04 23:22:17 +01:00
Vrihub
96fcff182c generic extractor (#735)
* Generic extractor, see issue #683

* Fix failed test_names test, no subcategory needed

* Prefix directory_fmt with "generic"

* Relax regex (would break some urls)

* Flake8 compliance

* pattern: don't require a scheme

This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).

* Fix using g: and r: on urls without http(s) scheme

Almost all extractors accept urls without an initial http(s) scheme.

Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).

This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.

This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.

* Relax imageurl_pattern_ext: allow relative urls

* First round of small suggested changes

* Support image urls starting with "//"

* self.baseurl: remove trailing slash

* Relax regexp (didn't catch some image urls)

* Some fixes and cleanup

* Fix domain pattern; option to enable extractor

Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
2021-12-29 22:39:29 +01:00
Mike Fährmann
882c614281 add album extractor for lolisafe/chibisafe instances
- support bunkr.is (closes #2038)
- support zz.ht    (closes #2105)
2021-12-21 19:24:17 +01:00
Mike Fährmann
299bd2f1f5 [rule34us] add 'tag' and 'post' extractors (#1527) 2021-12-14 00:27:46 +01:00