Commit Graph

65 Commits

Author SHA1 Message Date
Mike Fährmann
0e601de67b [sankaku] simplify 'pool' tags (#1388)
normalize 'tags' and 'artist_tags' to a string-list
2021-03-23 18:45:45 +01:00
Mike Fährmann
d085ade9d5 [sankaku] add 'tag_string' metadata field (#1388)
The 'join()'ed version of 'tags'.
Handling lists in format strings isn't properly supported yet.
2021-03-23 15:42:13 +01:00
Mike Fährmann
2dffd231b7 [sankaku] add enumeration index for books (#1388) 2021-03-23 15:32:54 +01:00
Mike Fährmann
96a51ff169 [sankaku] update invalid-token detection (fixes #1309) 2021-02-11 19:49:24 +01:00
Mike Fährmann
2da9068ea8 [sankaku] simplify login process 2021-01-12 00:15:22 +01:00
Mike Fährmann
b0beed7a06 [sankaku] add support for book searches (closes #1204) 2020-12-29 17:36:37 +01:00
Mike Fährmann
47a7a51944 [sankaku] fix 'invalid_token' detection 2020-12-27 02:31:01 +01:00
Mike Fährmann
e41e2be2f9 [booru] split '_prepare_post()' 2020-12-24 01:13:54 +01:00
Mike Fährmann
b233531aaa [sankaku] use '/posts' endpoint for single posts 2020-12-22 02:44:40 +01:00
Mike Fährmann
459a0af4f8 [sankaku] add support for sankaku.app URLs (closes #1193) 2020-12-22 01:57:53 +01:00
Mike Fährmann
537742c0ee [sankaku] normalize 'created_at' metadata (closes #1190) 2020-12-21 02:06:29 +01:00
Mike Fährmann
465015f75a [sankaku] reimplement login support (#1176, #1182) 2020-12-17 16:12:59 +01:00
Mike Fährmann
8d2e4e5f13 [booru] improve error handling
e.g. for posts without a valid 'file_url' (#1176)
2020-12-17 01:16:45 +01:00
Mike Fährmann
b2c55f0a72 [sankaku] remove login support
The old login method for 'https://chan.sankakucomplex.com/user/login'
and the cookies it produces have no effect on the results from
'beta.sankakucomplex.com'.
2020-12-08 21:05:47 +01:00
Mike Fährmann
ecdea799dd [sankaku] use 'beta.sankakucomplex.com' API endpoints 2020-12-05 22:08:58 +01:00
Mike Fährmann
1e3dd7330e merge SharedConfigMixin functionality into Extractor 2020-11-17 00:34:07 +01:00
Mike Fährmann
844793847c update extractor test results 2020-10-11 18:15:41 +02:00
Mike Fährmann
4409d00141 embed error messages in StopExtraction exceptions 2019-10-28 16:39:49 +01:00
Mike Fährmann
7a5e78741c [booru] build directory path for each file (#385) 2019-08-18 23:28:33 +02:00
Mike Fährmann
40637556fa [ngomik] fix extraction 2019-07-28 10:53:46 +02:00
Mike Fährmann
7a99e85943 [kissmanga] fix download URLs and file extensions
The current Blogspot image URLs hosted on Kissmanga end with an
"invalid" query parameter (/000.png&upx=...), which doesn't get
recognized by 'spliturl()' and 'parseurl()' as such and gets therefore
included in the 'extension' field from 'text.nameext_from_url()'.
2019-06-28 20:34:43 +02:00
Mike Fährmann
74c2415138 [sankakucomplex] move article extractor to its own module (#258) 2019-05-27 23:49:23 +02:00
Mike Fährmann
1e3e15c4f3 [sankaku] add article extractor (#258) 2019-05-26 17:42:36 +02:00
Mike Fährmann
efa805c5d7 [sankaku] update pagination end condition (fixes #265)
Pagination over popular listings (`date:...+order:popular") never
terminates, not even on the site itself, and at some point returns the
same results over and over again.
2019-05-20 15:46:06 +02:00
Mike Fährmann
0b4be57a10 [sankaku] fix error when no tags available (closes #259)
[ci skip]
2019-05-14 23:40:07 +02:00
Mike Fährmann
aa8e366b90 [luscious] fix tag extraction 2019-05-14 17:35:52 +02:00
Mike Fährmann
a2af2d2965 adjust cache maxage values 2019-03-14 22:21:49 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
4d656a81ca replace SharedConfigExtractor class with a Mixin 2019-02-04 13:46:02 +01:00
Mike Fährmann
dd358b4564 improve cookie handling during logins 2019-01-30 17:09:32 +01:00
Mike Fährmann
2d2953a5bf add 'text.parse_float()' + cleanup in text.py 2019-01-29 16:46:21 +01:00
Mike Fährmann
78b5f29a00 [sankaku] unescape tags 2019-01-20 16:18:13 +01:00
Mike Fährmann
2be4c9ffe3 [sankaku] small code improvements 2018-09-16 21:01:28 +02:00
Mike Fährmann
99137f1bee [sankaku] send login info as formdata
Previously they were erroneously send as URL parameters.
2018-09-14 17:54:15 +02:00
Mike Fährmann
fa64c38d5b [sankaku] fix pagination for user favorites (#106) 2018-09-14 17:51:46 +02:00
Mike Fährmann
b164231bca [sankaku] increase default values for 'wait-min/-max' 2018-08-03 17:06:51 +02:00
Mike Fährmann
269dc2bbd5 [sankaku] add 'tags' option (#94) 2018-07-14 09:53:01 +02:00
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
5008e105ee update archive IDs
... to behave in a more straightforward way when dealing with
bookmarks/favourites/etc.

specific IDs are now grouped by their owner, album-id, ... to
allow for duplicates when it would be expected.
2018-03-01 18:20:50 +01:00
Mike Fährmann
829ddf4ac1 [sankaku] general improvements
- simplify regex
- unquote search tags
- increase default wait-time between HTTP requests
  - downloading several hundreds of images always resulted
    in '429 Too Many Requests' eventually
- circumvent paging restrictions for unauthenticated users by only
  using the 'next' parameter
  - setting 'page' to a constant, low value (or simply omitting it)
    does the trick
2018-02-27 16:51:14 +01:00
Jad
49463f76bb support multi-page URL (#79)
* support multi-page URL

* fix

* all done.

* fix, again
2018-02-26 11:13:49 +01:00
Mike Fährmann
34873dbd90 set 'archive_fmt' values
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
e420a28bbc fix cookie tests 2018-01-09 21:43:52 +01:00
Mike Fährmann
b33efc99a4 [idolcomplex] add support for idol.sankakucomplex.com 2018-01-09 17:54:37 +01:00
Mike Fährmann
19a6ae57b2 [sankaku] add pool extractor 2017-12-12 19:45:10 +01:00
Mike Fährmann
e52f0cc1ed [sankaku] add post extractor 2017-12-12 18:20:15 +01:00
Mike Fährmann
595593a35e [sankaku] rewrite
- better code structure and extensibility
- better metadata
2017-12-12 18:09:45 +01:00
Mike Fährmann
a3924d2072 [sankaku] fix swf extraction (closes #52) 2017-12-07 15:45:43 +01:00
Mike Fährmann
e6814aebe2 add 'extractor.*.user-agent' config option 2017-11-15 14:01:33 +01:00