Commit Graph

69 Commits

Author SHA1 Message Date
Mike Fährmann
1c36e65e9b [exhentai] choose site version depending on input URL (#278)
Use e-hentai.org as root and cookiedomain if the input URL is from
e-hentai (or g.e-hentai), use exhentai.org otherwise.
2019-05-31 15:34:39 +02:00
Mike Fährmann
1f7fa9dc8e [exhentai] update data extraction code
- parse 'date' to datetime object
- use 'text.extract_from()'
2019-05-08 15:44:29 +02:00
Mike Fährmann
5398bfbd69 [exhentai] fix search and favorite extraction
removes basically all metadata, but that can be compensated for with the
right search query. writing "parsers" for all 4 possible views that have
been introduced in the latest changes is too much of a hassle ...
2019-03-28 16:22:02 +01:00
Mike Fährmann
a2af2d2965 adjust cache maxage values 2019-03-14 22:21:49 +01:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
61741d7333 provide type information for Queue messages
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
2019-02-12 21:32:32 +01:00
Mike Fährmann
2e516a1e3e store the full original URL in Extractor.url 2019-02-12 18:46:48 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
dd358b4564 improve cookie handling during logins 2019-01-30 17:09:32 +01:00
Mike Fährmann
134487ffb0 [exhentai] stop extraction if image limit is exceeded (#141)
can be turned off with the `exhentai.limits' option
2019-01-26 18:40:39 +01:00
Mike Fährmann
e868fb4393 [exhentai] improve gallery extraction
- match image page URLs and extract galleries from that point onward
- add a few more metadata entries: 'parent', 'visible', 'cost'
2019-01-26 18:23:25 +01:00
Mike Fährmann
2ffc105887 [exhentai] extract tag metadata 2019-01-15 18:08:17 +01:00
Mike Fährmann
2801a0d997 [exhentai] skip "Content Warning" page when not logged in
(closes #97)
2018-08-16 09:17:22 +02:00
Mike Fährmann
b8c97d2295 use 'extractor.request()' for more HTTP requests 2018-06-25 23:40:59 +02:00
Mike Fährmann
017188d268 improve extractor.request()
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
2018-06-18 16:29:56 +02:00
Mike Fährmann
7a58151566 fix util.parse_bytes invocations
(should be text.parse_bytes)
2018-05-10 22:07:55 +02:00
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
64d7c85b55 [exhentai] improve metadata
- add 'width', 'height' and 'size' (in bytes) for each image
- change the former 'size' and 'size_units' into 'gallery_size'
2018-04-03 18:59:53 +02:00
Mike Fährmann
52d41c41e7 [exhentai] add extractor for favorited galleries 2018-03-27 18:58:42 +02:00
Mike Fährmann
63cc2599c4 [exhentai] add extractor for search results 2018-03-27 16:50:47 +02:00
Mike Fährmann
34873dbd90 set 'archive_fmt' values
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
e6814aebe2 add 'extractor.*.user-agent' config option 2017-11-15 14:01:33 +01:00
Mike Fährmann
9fc1d0c901 implement and use 'util.safe_int()'
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
2017-09-24 15:59:25 +02:00
Mike Fährmann
6f30cf4c64 change keyword names to valid Python identifiers
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.

(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
2017-09-10 22:20:47 +02:00
Mike Fährmann
c0755a4d5e [exhentai] revert login-method to its old version (#37)
Additional cookies don't seem to help and have to be manually set
anyway. The older method is more likely to succeed, so I'd rather
use this one.
2017-08-29 22:10:38 +02:00
Mike Fährmann
3ee39ffd93 [exhentai] update login procedure (#37)
This new version behaves pretty much exactly like a browser would and
caches all cookies sent to it and not just "ipb_member_id" and
"ipb_pass_hash".
2017-08-28 21:03:32 +02:00
Mike Fährmann
2d0dfe9d56 [exhenai] init headers before login and detect sadpanda
- also debug-logs html after failed login
- #37
2017-08-25 16:44:59 +02:00
Mike Fährmann
915a0137de improve 'extractor.request'
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
2017-08-05 16:11:46 +02:00
Mike Fährmann
7aa9fa796a code cleanup and fixes 2017-07-25 14:59:41 +02:00
Mike Fährmann
808f67ba7d use 'cookiedomain' for cookies set by object-config-values
otherwise these cookies would not be picked up by the
_check_cookies() method.
2017-07-22 15:43:35 +02:00
Mike Fährmann
0610ae5000 skip login if cookies are present 2017-07-17 10:33:36 +02:00
Mike Fährmann
58e95a7487 share extractor and downloader sessions
There was never any "good" reason for the strict separation
between extractors and downloaders. This change allows for
reduced resource usage (probably unnoticeable) and less lines
of code at the "cost" of tighter coupling.
2017-06-30 19:38:14 +02:00
Mike Fährmann
1dac76fd1c update extractor docstrings 2017-06-28 17:39:07 +02:00
Mike Fährmann
d3b04076f7 add .netrc support (#22)
Use the '--netrc' cmdline option or set the 'netrc' config option
to 'true' to enable the use of .netrc authentication data.

The 'machine' names for the .netrc info are the lowercase extractor
names (or categories): batoto, exhentai, nijie, pixiv, seiga.
2017-06-24 12:17:26 +02:00
Mike Fährmann
af56887a47 [exhentai] fall back to e-hentai if no username is given 2017-04-28 15:59:56 +02:00
Mike Fährmann
4b967fa189 implement and use extractor.config() method 2017-04-25 17:12:48 +02:00
Mike Fährmann
b603b592cf [exhentai] accept "e-hentai.org" URLs (#11) 2017-04-04 09:30:35 +02:00
Mike Fährmann
841fd50242 move code into util.py 2017-03-28 13:12:44 +02:00
Mike Fährmann
1d46be545c add login notifications 2017-03-17 09:42:59 +01:00
Mike Fährmann
e87e6fbc67 change some config keys
directory_fmt     -> directory
filename_fmt      -> filename
download-original -> original
2017-02-21 22:11:02 +01:00
Mike Fährmann
0a6487afe8 [exhentai] fix detection of invalid gallery keys 2017-02-15 03:36:46 +01:00
Mike Fährmann
94e10f249a code adjustments according to pep8 nr2 2017-02-01 00:53:19 +01:00
Mike Fährmann
4a8d74973c adjust login methods to a specific style 2017-01-08 17:33:25 +01:00
Mike Fährmann
a849d8f2f7 add a few more tests 2016-12-31 00:51:06 +01:00
Mike Fährmann
ff2a65d5c1 [exhentai] raise proper exception for 'unavailable' galleries 2016-12-22 12:42:41 +01:00
Mike Fährmann
492cb38391 [exhentai] use image-count as stop signal 2016-10-12 15:19:31 +02:00
Mike Fährmann
607f50effb [exhentai] retry failed api calls 2016-10-11 13:27:19 +02:00
Mike Fährmann
12c99293b6 allow extension by Content-Type for exhentai, seiga, senmanga 2016-09-30 16:43:43 +02:00
Mike Fährmann
56d810c896 update keyword hashes for tests 2016-09-25 17:28:46 +02:00