Commit Graph

72 Commits

Author SHA1 Message Date
Mike Fährmann
fe7e2281ac [nijie] increase default delay between requests (#5221)
1-2s is not enough
2024-02-20 18:19:49 +01:00
Mike Fährmann
2191e29e14 [nijie] fix image URL for single image posts (#5049) 2024-01-11 05:07:38 +01:00
Mike Fährmann
b6903a4c90 [nijie] add 'count' metadata field
https://github.com/mikf/gallery-dl/issues/146#issuecomment-1812849102
2023-12-30 22:25:59 +01:00
Mike Fährmann
a30a3e44d5 [nijie] move 'username required' out of _login_impl 2023-12-18 23:57:44 +01:00
Mike Fährmann
57fc6fcf83 replace '24*3600' with '86400'
and generalize cache maxage values
2023-12-18 23:57:22 +01:00
Mike Fährmann
4eb3590103 [nijie] fix image URLs of multi-image posts (#4876) 2023-12-05 17:48:50 +01:00
Mike Fährmann
3984a49abf [nijie] set 1-2s delay between requests to avoid 429 errors 2023-11-03 23:44:47 +01:00
Mike Fährmann
3ecb512722 send Referer headers by default 2023-09-19 00:02:04 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
3b369ce3d1 [nijie] add 'followed' extractor (#3048) 2022-10-14 14:59:18 +02:00
Mike Fährmann
c4a62a48ae [nijie] add 'feed' extractor (#3048) 2022-10-14 12:03:00 +02:00
Mike Fährmann
636d03df95 [nijie] reduce cache maxage to 90 days 2022-08-27 21:57:45 +02:00
Mike Fährmann
241e82e18d [horne] add support for horne.red (#2700) 2022-06-25 16:52:16 +02:00
Mike Fährmann
d11e2191ae [nijie] support /history_nuita.php listings (closes #2541) 2022-05-02 09:03:34 +02:00
Mike Fährmann
1f9a0e2fd8 update extractor test results 2022-04-18 17:24:00 +02:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
b58e605dc7 raise error when required username or password are missing
do not try to login as 'None' (#1192)
2020-12-22 14:40:18 +01:00
Mike Fährmann
6514312126 [nijie] add 'include' option (closes #1018) 2020-09-25 18:18:35 +02:00
Mike Fährmann
e62c209ca0 [nijie] fix 'date' parsing 2019-11-30 23:08:21 +01:00
Mike Fährmann
94dbdbf506 [nijie] change default filename format
… to be consistent with Pixiv filenames
2019-11-04 20:47:38 +01:00
Mike Fährmann
1faec285d1 [nijie] further improvements (closes #423)
- provide a 'user_name' metadata field
  - usually the same as 'artist_id', except for favorite downloads
- extract the whole description text and properly escape HTML entities
- fixed an issue with titles or tags containing double quotes
2019-09-27 23:14:32 +02:00
Mike Fährmann
20eb6c401f [nijie] improvements and fixes (#423)
- ignore unavailable image pages
- more metadata fields: artist_name, date, tags
- rename 'index' to 'num'
- improved code structure
2019-09-26 21:45:01 +02:00
Mike Fährmann
12da6bd0c9 [simplyhentai] fix/improve extraction 2019-07-06 20:25:53 +02:00
Mike Fährmann
fdec59f8e2 replace extractor.request() 'expect' argument
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
2019-07-05 00:42:16 +02:00
Mike Fährmann
b89f0d8d3c update extractor result tests 2019-07-01 20:02:47 +02:00
Mike Fährmann
a2af2d2965 adjust cache maxage values 2019-03-14 22:21:49 +01:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
00dc37ccbf replace AsynchronousMixin Extractor with a Mixin 2019-02-04 14:21:19 +01:00
Mike Fährmann
dd358b4564 improve cookie handling during logins 2019-01-30 17:09:32 +01:00
Mike Fährmann
173add6935 [nijie] fix artist_id extraction
view_popup.php pages for older images or dojins either have the
artist_id value at a different place or not at all.
2018-07-10 12:30:53 +02:00
Mike Fährmann
017188d268 improve extractor.request()
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
2018-06-18 16:29:56 +02:00
Mike Fährmann
2d17a9e07f improve extractor.request()
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
2018-04-23 18:45:59 +02:00
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
7b562907c3 [nijie] add favorites extractor
adds support for 'https://nijie.info/user_like_illust_view.php?id=...'
2018-03-31 18:54:25 +02:00
Mike Fährmann
445db75955 [nijie] improve extraction and metadata
- add 'title' and 'description'
- split 'artist_id' into 'user_id' and 'artist_id'
  - 'user_id' is the ID of the user from which the image entry
    originates from
  - 'artist_id' is the ID of the actual image artist
- improve pagination and URL patterns
2018-03-31 18:48:41 +02:00
Mike Fährmann
a112e3f2a0 [nijie] add doujin extractor
adds support for "https://nijie.info/members_dojin.php?id=<artist_id>"
2018-03-31 18:17:41 +02:00
Mike Fährmann
3cec533c28 Merge branch 'archive' 2018-02-12 18:07:58 +01:00
Mike Fährmann
f5f2d29f56 [nijie] fix dojin extraction
- correctly extract artist_id
- set extension to "jpg" if it was empty and let filetype checks do
  the rest
2018-02-09 22:06:26 +01:00
Mike Fährmann
34873dbd90 set 'archive_fmt' values
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
9c138dfc1f [common] detect empty HTTP response bodies 2017-09-26 16:49:58 +02:00
Mike Fährmann
6f30cf4c64 change keyword names to valid Python identifiers
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.

(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
2017-09-10 22:20:47 +02:00
Mike Fährmann
915a0137de improve 'extractor.request'
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
2017-08-05 16:11:46 +02:00
Mike Fährmann
7aa9fa796a code cleanup and fixes 2017-07-25 14:59:41 +02:00
Mike Fährmann
808f67ba7d use 'cookiedomain' for cookies set by object-config-values
otherwise these cookies would not be picked up by the
_check_cookies() method.
2017-07-22 15:43:35 +02:00
Mike Fährmann
0610ae5000 skip login if cookies are present 2017-07-17 10:33:36 +02:00