Commit Graph

128 Commits

Author SHA1 Message Date
Mike Fährmann
f612284d24 cache cfclearance cookies 2019-03-14 16:14:29 +01:00
Mike Fährmann
591a07f20c small code changes and cleanups 2019-03-13 22:03:02 +01:00
Mike Fährmann
6dae6bee37 automatically detect and bypass cloudflare challenge pages
TODO: cache and re-apply cfclearance cookies
2019-03-10 15:31:33 +01:00
Mike Fährmann
4ca4631bad simplify auto-disabling certificate verification
if no certificate bundle is found
2019-03-08 16:34:01 +01:00
Mike Fährmann
09d872a2b1 generalize extractor creation code 2019-03-07 22:55:26 +01:00
Mike Fährmann
3595cd582f use GalleryExtractor as common base class 2019-03-01 14:13:16 +01:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
32edf4fc7b add '_extractor' info to manga extractor results 2019-02-13 13:23:36 +01:00
Mike Fährmann
2e516a1e3e store the full original URL in Extractor.url 2019-02-12 18:46:48 +01:00
Mike Fährmann
580baef72c change Chapter and MangaExtractor classes
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
9a9cd32461 implement alternative constructor for extractors 2019-02-09 14:42:25 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
bc0951d974 allow for simplified test data structures
Instead of a strict list of (URL, RESULTS)-tuples, extractor result
tests can now be a single (URL, RESULTS)-tuple, if it's just one test,
and "only matching" tests can now be a simple string.
2019-02-06 17:24:44 +01:00
Mike Fährmann
00dc37ccbf replace AsynchronousMixin Extractor with a Mixin 2019-02-04 14:21:19 +01:00
Mike Fährmann
4d656a81ca replace SharedConfigExtractor class with a Mixin 2019-02-04 13:46:02 +01:00
Mike Fährmann
bfbbac4495 [tsumino] add login capabilities (#161) 2019-01-30 17:58:48 +01:00
Mike Fährmann
dd358b4564 improve cookie handling during logins 2019-01-30 17:09:32 +01:00
Mike Fährmann
06cbf5f9c4 implement 'chapter-reverse' option (#149)
Setting it to `true` will start with the latest chapter instead of the
first one.
2019-01-07 18:22:33 +01:00
Mike Fährmann
9a98b6769d use extractor.request for API calls (#130)
... at least for OAuth1.0 based APIs (flickr, smugmug, tumblr)
2018-12-04 21:29:06 +01:00
Mike Fährmann
b828473aa3 retry HTTP requests for more exception classes 2018-11-19 15:49:13 +01:00
Mike Fährmann
c47482b110 smaller changes, missing docs, etc.
- make 'netrc' extractor-specific
- rename 'downloader.enable' to 'enabled'
- document 'downloader.ytdl.format'
- consistent newlines in configuration.rst
2018-11-16 18:18:07 +01:00
Mike Fährmann
2fa28a2609 update default user-agent string (closes #122) 2018-11-11 10:07:10 +01:00
Mike Fährmann
c9861ca812 adjust message for status_code based exceptions
from: 5xx HTTP Error: Reason
to  : 5xx: Reason

The "HTTP Error" part was in there to emulate Request's error messages
from response.raise_for_status(), but it reads a lot better without.
2018-10-18 15:09:49 +02:00
Mike Fährmann
4a348990f4 adjust value resolution for retries/timeout/verify options
This change introduces 'extractor.*.retries/timeout/verify' options
as a general way to set these values for all HTTP requests.

'downloader.http.retries/timeout/verify' is a way to override these
options for file downloads only and will fall back to 'extractor.*.…*
values if they haven't been explicitly set.

Also: downloader classes now take an extractor object as first argument
instead of a requests.session.
2018-10-07 21:13:39 +02:00
Mike Fährmann
f647f5d9c3 use 'verify' option for regular HTTP requests 2018-10-06 16:38:43 +02:00
Mike Fährmann
68d6033a5d use 'retries' and 'timeout' options for regular HTTP requests 2018-08-02 16:11:54 +02:00
Mike Fährmann
017188d268 improve extractor.request()
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
2018-06-18 16:29:56 +02:00
Mike Fährmann
2d17a9e07f improve extractor.request()
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
2018-04-23 18:45:59 +02:00
Mike Fährmann
8704d850bf add explicit proxy support (#76)
- '--proxy' as command-line argument
- 'extractor.*.proxy' as config option
2018-02-19 18:45:06 +01:00
Mike Fährmann
179bcdd349 adjust archive-ids 2018-02-13 04:50:45 +01:00
Mike Fährmann
3cec533c28 Merge branch 'archive' 2018-02-12 18:07:58 +01:00
Mike Fährmann
5b3c34aa96 use generic chapter-extractor in more modules 2018-02-07 12:36:39 +01:00
Mike Fährmann
7a412f5c32 implement generic manga-chapter extractor 2018-02-04 22:02:04 +01:00
Mike Fährmann
84a52a9256 add DownloadArchive class 2018-01-30 15:23:23 +01:00
Mike Fährmann
cc0c2cca57 [reddit] add extractor for reddit-hosted images (closes #68) 2018-01-14 18:55:42 +01:00
Mike Fährmann
e6814aebe2 add 'extractor.*.user-agent' config option 2017-11-15 14:01:33 +01:00
Mike Fährmann
baf8094868 improve Extractor.request()'s retry behavior 2017-11-13 20:37:11 +01:00
Mike Fährmann
16783e327f [common] fix UnboundLocalError in Extractor.request() 2017-10-20 18:51:06 +02:00
Mike Fährmann
9aecc67841 [common] explicitly handle HTTP status code 429 2017-10-14 21:37:59 +02:00
Mike Fährmann
b319f4bab3 smaller code and text changes 2017-10-01 18:23:40 +02:00
Mike Fährmann
26a866e7d8 implement (sub)category-transfer between extractors (#41)
ImageFap- and all Manga-Extractors will transfer their (sub)category
values to other extractors instantiated by them, which will in turn
allow those to use options set for their parents.

Example:
ImagefapGalleryExtractors will use options set under
extractor.imagefap.user, if (and only if) they have been instantiated by
a ImagefapUserExtractor; and options from extractor.imagefap.gallery
otherwise.
2017-09-26 21:05:11 +02:00
Mike Fährmann
9c138dfc1f [common] detect empty HTTP response bodies 2017-09-26 16:49:58 +02:00
Mike Fährmann
deb2e803ba simplify MangaExtractor class 2017-09-24 16:05:43 +02:00
Mike Fährmann
0dedbe759c enable '--chapter-filter'
The same filter infrastructure that can be applied to image URLS now
also works for manga chapters and other delegated URLs.

TODO: actually provide any metadata (currently supported is only
deviantart and imagefap).
2017-09-12 16:19:00 +02:00
Mike Fährmann
be30fb2f98 add common config category for boorus and foolslide 2017-08-29 22:42:48 +02:00
Mike Fährmann
915a0137de improve 'extractor.request'
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
2017-08-05 16:11:46 +02:00
Mike Fährmann
7aa9fa796a code cleanup and fixes 2017-07-25 14:59:41 +02:00
Mike Fährmann
55f048d02b ignore case of cookiejar magic strings 2017-07-24 18:33:42 +02:00
Mike Fährmann
808f67ba7d use 'cookiedomain' for cookies set by object-config-values
otherwise these cookies would not be picked up by the
_check_cookies() method.
2017-07-22 15:43:35 +02:00