Mike Fährmann
f612284d24
cache cfclearance cookies
2019-03-14 16:14:29 +01:00
Mike Fährmann
591a07f20c
small code changes and cleanups
2019-03-13 22:03:02 +01:00
Mike Fährmann
6dae6bee37
automatically detect and bypass cloudflare challenge pages
...
TODO: cache and re-apply cfclearance cookies
2019-03-10 15:31:33 +01:00
Mike Fährmann
4ca4631bad
simplify auto-disabling certificate verification
...
if no certificate bundle is found
2019-03-08 16:34:01 +01:00
Mike Fährmann
09d872a2b1
generalize extractor creation code
2019-03-07 22:55:26 +01:00
Mike Fährmann
3595cd582f
use GalleryExtractor as common base class
2019-03-01 14:13:16 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
32edf4fc7b
add '_extractor' info to manga extractor results
2019-02-13 13:23:36 +01:00
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
2019-02-12 18:46:48 +01:00
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
9a9cd32461
implement alternative constructor for extractors
2019-02-09 14:42:25 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
bc0951d974
allow for simplified test data structures
...
Instead of a strict list of (URL, RESULTS)-tuples, extractor result
tests can now be a single (URL, RESULTS)-tuple, if it's just one test,
and "only matching" tests can now be a simple string.
2019-02-06 17:24:44 +01:00
Mike Fährmann
00dc37ccbf
replace AsynchronousMixin Extractor with a Mixin
2019-02-04 14:21:19 +01:00
Mike Fährmann
4d656a81ca
replace SharedConfigExtractor class with a Mixin
2019-02-04 13:46:02 +01:00
Mike Fährmann
bfbbac4495
[tsumino] add login capabilities ( #161 )
2019-01-30 17:58:48 +01:00
Mike Fährmann
dd358b4564
improve cookie handling during logins
2019-01-30 17:09:32 +01:00
Mike Fährmann
06cbf5f9c4
implement 'chapter-reverse' option ( #149 )
...
Setting it to `true` will start with the latest chapter instead of the
first one.
2019-01-07 18:22:33 +01:00
Mike Fährmann
9a98b6769d
use extractor.request for API calls ( #130 )
...
... at least for OAuth1.0 based APIs (flickr, smugmug, tumblr)
2018-12-04 21:29:06 +01:00
Mike Fährmann
b828473aa3
retry HTTP requests for more exception classes
2018-11-19 15:49:13 +01:00
Mike Fährmann
c47482b110
smaller changes, missing docs, etc.
...
- make 'netrc' extractor-specific
- rename 'downloader.enable' to 'enabled'
- document 'downloader.ytdl.format'
- consistent newlines in configuration.rst
2018-11-16 18:18:07 +01:00
Mike Fährmann
2fa28a2609
update default user-agent string ( closes #122 )
2018-11-11 10:07:10 +01:00
Mike Fährmann
c9861ca812
adjust message for status_code based exceptions
...
from: 5xx HTTP Error: Reason
to : 5xx: Reason
The "HTTP Error" part was in there to emulate Request's error messages
from response.raise_for_status(), but it reads a lot better without.
2018-10-18 15:09:49 +02:00
Mike Fährmann
4a348990f4
adjust value resolution for retries/timeout/verify options
...
This change introduces 'extractor.*.retries/timeout/verify' options
as a general way to set these values for all HTTP requests.
'downloader.http.retries/timeout/verify' is a way to override these
options for file downloads only and will fall back to 'extractor.*.…*
values if they haven't been explicitly set.
Also: downloader classes now take an extractor object as first argument
instead of a requests.session.
2018-10-07 21:13:39 +02:00
Mike Fährmann
f647f5d9c3
use 'verify' option for regular HTTP requests
2018-10-06 16:38:43 +02:00
Mike Fährmann
68d6033a5d
use 'retries' and 'timeout' options for regular HTTP requests
2018-08-02 16:11:54 +02:00
Mike Fährmann
017188d268
improve extractor.request()
...
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
2018-06-18 16:29:56 +02:00
Mike Fährmann
2d17a9e07f
improve extractor.request()
...
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
2018-04-23 18:45:59 +02:00
Mike Fährmann
8704d850bf
add explicit proxy support ( #76 )
...
- '--proxy' as command-line argument
- 'extractor.*.proxy' as config option
2018-02-19 18:45:06 +01:00
Mike Fährmann
179bcdd349
adjust archive-ids
2018-02-13 04:50:45 +01:00
Mike Fährmann
3cec533c28
Merge branch 'archive'
2018-02-12 18:07:58 +01:00
Mike Fährmann
5b3c34aa96
use generic chapter-extractor in more modules
2018-02-07 12:36:39 +01:00
Mike Fährmann
7a412f5c32
implement generic manga-chapter extractor
2018-02-04 22:02:04 +01:00
Mike Fährmann
84a52a9256
add DownloadArchive class
2018-01-30 15:23:23 +01:00
Mike Fährmann
cc0c2cca57
[reddit] add extractor for reddit-hosted images ( closes #68 )
2018-01-14 18:55:42 +01:00
Mike Fährmann
e6814aebe2
add 'extractor.*.user-agent' config option
2017-11-15 14:01:33 +01:00
Mike Fährmann
baf8094868
improve Extractor.request()'s retry behavior
2017-11-13 20:37:11 +01:00
Mike Fährmann
16783e327f
[common] fix UnboundLocalError in Extractor.request()
2017-10-20 18:51:06 +02:00
Mike Fährmann
9aecc67841
[common] explicitly handle HTTP status code 429
2017-10-14 21:37:59 +02:00
Mike Fährmann
b319f4bab3
smaller code and text changes
2017-10-01 18:23:40 +02:00
Mike Fährmann
26a866e7d8
implement (sub)category-transfer between extractors ( #41 )
...
ImageFap- and all Manga-Extractors will transfer their (sub)category
values to other extractors instantiated by them, which will in turn
allow those to use options set for their parents.
Example:
ImagefapGalleryExtractors will use options set under
extractor.imagefap.user, if (and only if) they have been instantiated by
a ImagefapUserExtractor; and options from extractor.imagefap.gallery
otherwise.
2017-09-26 21:05:11 +02:00
Mike Fährmann
9c138dfc1f
[common] detect empty HTTP response bodies
2017-09-26 16:49:58 +02:00
Mike Fährmann
deb2e803ba
simplify MangaExtractor class
2017-09-24 16:05:43 +02:00
Mike Fährmann
0dedbe759c
enable '--chapter-filter'
...
The same filter infrastructure that can be applied to image URLS now
also works for manga chapters and other delegated URLs.
TODO: actually provide any metadata (currently supported is only
deviantart and imagefap).
2017-09-12 16:19:00 +02:00
Mike Fährmann
be30fb2f98
add common config category for boorus and foolslide
2017-08-29 22:42:48 +02:00
Mike Fährmann
915a0137de
improve 'extractor.request'
...
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
2017-08-05 16:11:46 +02:00
Mike Fährmann
7aa9fa796a
code cleanup and fixes
2017-07-25 14:59:41 +02:00
Mike Fährmann
55f048d02b
ignore case of cookiejar magic strings
2017-07-24 18:33:42 +02:00
Mike Fährmann
808f67ba7d
use 'cookiedomain' for cookies set by object-config-values
...
otherwise these cookies would not be picked up by the
_check_cookies() method.
2017-07-22 15:43:35 +02:00