Commit Graph

29 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
cc15fbe71a [moebooru] add generalized extractors for moebooru sites
- add support for sakugabooru.com (closes #1136)
- add support for lolibooru.moe   (closes #1050)

This allows users to dynamically add support for moebooru/myimouto
based sites by adding an entry to their config file
(like for foolslide, foolfuuka, etc)

For example:
{
    "extractor": {
        "moebooru": {
            "new-site-1": {"root": "https://site1.net"},
            "new-site-2": {"root": "https://www.site2.moe"}
        }
    }
}
2020-12-01 22:27:18 +01:00
Mike Fährmann
1d4a369ea2 update extractor test results 2020-02-27 22:15:40 +01:00
Mike Fährmann
978cb03f81 update misc test results
- Livedoor now uses https:// for its image URLs
- Instagram image URLs got simplified
2019-11-20 21:45:48 +01:00
Mike Fährmann
2a3bd4e3c7 rename extractor classes starting with a digit 2019-11-02 20:42:09 +01:00
Mike Fährmann
11ea689013 [simplyhentai] fix image and video URLs 2019-09-16 21:37:16 +02:00
Mike Fährmann
f2cf1c1d73 use 'text.extract_from()' in a few places 2019-04-21 15:19:20 +02:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
4a57509392 generalize tag-splitting option (#92)
- extend functionality to other booru sites:
  - http://behoimi.org/
  - https://konachan.com/
  - https://e621.net/
  - https://rule34.xxx/
  - https://safebooru.org/
  - https://yande.re/
2018-07-04 12:21:16 +02:00
Mike Fährmann
974e73bdbb [booru] smaller code adjustments 2018-01-06 17:48:49 +01:00
Mike Fährmann
9e8a84ab6c [booru] rewrite using Mixin classes (#59)
- improved code structure
- improved URL patterns
- better pagination to work around page limits on
  - Danbooru
  - e621
  - 3dbooru
2018-01-04 00:01:39 +01:00
Mike Fährmann
e6814aebe2 add 'extractor.*.user-agent' config option 2017-11-15 14:01:33 +01:00
Mike Fährmann
158e60ee89 [3dbooru] enable download continuation
behoimi.org doesn't respect 'Range' headers and doesn't report
'Content-Length' for compressed content encodings.
2017-10-24 13:05:31 +02:00
Mike Fährmann
81a7788b40 replace space characters in unit test URLs 2017-10-23 17:00:53 +02:00
Mike Fährmann
41adb99e9c [pawoo] fix extraction
- changed access_token
- use account-search instead of general search
2017-10-02 18:33:52 +02:00
Mike Fährmann
00420ff202 [booru] consistent order for "popular" results 2017-09-06 12:33:19 +02:00
Mike Fährmann
65997d835b replace popular/ranking tests with older ones
Metadata of several year old lists shouldn't change as much as it
would for newer ones, which makes metadata-comparisons of the output
of build_testresult_db.oy easier.
2017-08-31 15:09:18 +02:00
Mike Fährmann
88a386977e [booru] add "popular" extractors for more sites
- konachan.com
- behoimi.org
- e621.net
2017-08-26 23:08:52 +02:00
Mike Fährmann
07214f4007 [booru] place subcategories into base classes 2017-08-26 22:27:55 +02:00
Mike Fährmann
94e10f249a code adjustments according to pep8 nr2 2017-02-01 00:53:19 +01:00
Mike Fährmann
d7e168799d consistent extractor naming scheme + docstrings 2016-09-12 10:34:31 +02:00
Mike Fährmann
616e0aedd6 update booru testdata 2015-12-22 03:10:52 +01:00
Mike Fährmann
ba99506c72 more extractor test-cases 2015-12-14 03:00:58 +01:00
Mike Fährmann
f7c47a6018 add subcategories to extractors 2015-11-30 01:11:13 +01:00
Mike Fährmann
1bce63124b [3dbooru] update to new format 2015-11-21 01:48:44 +01:00
Mike Fährmann
3b0fe8f544 unify booru filename-patterns 2015-11-06 16:48:33 +01:00
Mike Fährmann
3c13548f29 rewrite extractors to use config-module 2015-10-05 15:51:08 +02:00
Mike Fährmann
9c25c15438 [3dbooru] fix default regex 2015-05-04 18:22:07 +02:00
Mike Fährmann
a2cfbe445f add extractor '3dbooru' 2015-04-15 22:24:27 +02:00