Commit Graph

256 Commits

Author SHA1 Message Date
Mike Fährmann
751e535948 [nhentai] fix extraction (closes #156)
Use JSON embedded in webpage since API endpoints have been disabled
2019-01-14 07:57:50 +01:00
Mike Fährmann
1734a6c879 [reactor] detect "circular" redirects (#148) 2019-01-09 14:59:15 +01:00
Mike Fährmann
e53cdfd6a8 update build_supportedsites.py 2019-01-09 14:58:35 +01:00
Mike Fährmann
0afa913de4 [tumblr] add tests for hidden and private blogs (#145)
Hidden / dashboard-only blogs are pretty straightforward and "only"
require a valid 'access-token' and 'access-token-secret' for the given
'api-key' and 'api-secret', so that signed OAuth1.0 requests are possible.

Private / password protected blogs on the other hand are a bit
cumbersome. In addition to a valid 'access-token' and
'access-token-secret', they also require the account belonging to those
tokens to be a member of the blog itself. Knowing the password and
entering it in the website isn't enough to access a blog through the
API. Following a private blog is also impossible, so that option can't
work either.
2019-01-03 16:12:24 +01:00
Mike Fährmann
fa7fa2f8ff [deviantart1 update tests] 2019-01-01 15:39:34 +01:00
Mike Fährmann
259123732f [readcomiconline] improve comic-page parsing 2018-12-30 13:19:23 +01:00
Mike Fährmann
6c71e9cf5d [deviantart] add separate 'sta.sh' extractor (#113)
- supports multiple stashed deviations per page
- explicitly mentions sta.sh support on supportedsites.rst
2018-12-26 18:56:57 +01:00
Mike Fährmann
c5d4f558c9 allow missing field access keys in format strings (#136) 2018-12-22 13:54:14 +01:00
Mike Fährmann
4d73cc785d update test results 2018-12-14 16:07:32 +01:00
Mike Fährmann
010da8372a [instagram] relax test pattern 2018-12-11 19:59:28 +01:00
Mike Fährmann
15890930ea [mangafox] fix extraction
use mobile version since desktop version is obfuscated
2018-11-26 16:13:41 +01:00
Mike Fährmann
fb53b5dd55 fix control+c during -j and range tests 2018-11-25 18:54:05 +01:00
Mike Fährmann
59bb434ba5 [flickr] add ability to download all albums of a user
for example with 'https://www.flickr.com/photos/shona_s/albums'
2018-11-23 09:09:37 +01:00
Mike Fährmann
041bd501fc [hentaifoundry] unescape YII_CSRF_TOKEN value
This fixes the POST requests to /site/filters
2018-11-19 21:46:17 +01:00
Mike Fährmann
d4b2b73bef release version 1.6.0 2018-11-17 18:28:02 +01:00
Mike Fährmann
3c25fa2dad update build_testresult_db.py script 2018-11-15 22:58:14 +01:00
Mike Fährmann
7f6a0be982 adjust some tests 2018-11-15 22:50:04 +01:00
Mike Fährmann
966a9ca3a0 update test results 2018-11-10 19:14:54 +01:00
Mike Fährmann
c9861ca812 adjust message for status_code based exceptions
from: 5xx HTTP Error: Reason
to  : 5xx: Reason

The "HTTP Error" part was in there to emulate Request's error messages
from response.raise_for_status(), but it reads a lot better without.
2018-10-18 15:09:49 +02:00
Mike Fährmann
c00dce2adc [behance] enable 'categorytransfer' 2018-10-09 23:40:49 +02:00
Mike Fährmann
1532d1b690 fix 'range' tests and update a few test results 2018-10-08 23:53:58 +02:00
Mike Fährmann
0514d6a0ae make --filter and --range config-file options
The functionality of --(chapter-)filter and --(chapter-)range are now
also exposed as the following config-file options:

- extractor.*.image-filter
- extractor.*.image-range
- extractor.*.chapter-filter
- extractor.*.chapter-range

TODO: update configuration.rst
2018-10-07 21:39:56 +02:00
Mike Fährmann
4a348990f4 adjust value resolution for retries/timeout/verify options
This change introduces 'extractor.*.retries/timeout/verify' options
as a general way to set these values for all HTTP requests.

'downloader.http.retries/timeout/verify' is a way to override these
options for file downloads only and will fall back to 'extractor.*.…*
values if they haven't been explicitly set.

Also: downloader classes now take an extractor object as first argument
instead of a requests.session.
2018-10-07 21:13:39 +02:00
Mike Fährmann
ca6ac4db6a fix 'content' tests 2018-10-05 21:10:33 +02:00
Mike Fährmann
d70db2d555 Revert "[komikcast] fix extraction"
This reverts commit 5507f5ce2e.
2018-10-02 20:38:42 +02:00
Mike Fährmann
5507f5ce2e [komikcast] fix extraction 2018-09-29 16:37:30 +02:00
Mike Fährmann
17611bfec0 update build_supportedsites.py script 2018-09-28 12:43:19 +02:00
Mike Fährmann
e066f35118 update extractor tests 2018-09-21 11:25:56 +02:00
Mike Fährmann
22ab509a70 [bobx] rename "model" to "idol" extractor 2018-09-14 18:11:36 +02:00
Mike Fährmann
8a23b21d0e [tests] let 'pattern' require at least 1 URL 2018-09-02 21:19:44 +02:00
Mike Fährmann
0bc8ef51c8 [smugmug] Handle albums with no explicit owner (#100) 2018-09-01 12:55:02 +02:00
Mike Fährmann
590c0b3ad5 re-implement and improve filename formatter
A format string now gets parsed only once instead of re-parsing it each
time it is applied to a set of data.

The initial parsing causes directory path creation to be at about 2x
slower than before, since each format string there is used only once,
but building a filename, the more common operation, is at least 2x
faster. The "directory slowness" cancels at about 5 filenames and
everything above that is significantly faster.
2018-08-25 10:45:14 +02:00
Mike Fährmann
34b556922d update/restore tests 2018-08-23 15:47:40 +02:00
Mike Fährmann
e3055d356c release version 1.5.1 2018-08-17 13:21:36 +02:00
Mike Fährmann
f9ded38d89 [test:results] add support for "range" options in tests 2018-08-15 21:49:44 +02:00
Mike Fährmann
c9e6ccbd7c [test:extractor] small fixes and improvements 2018-08-15 21:49:33 +02:00
Mike Fährmann
7f4e41c989 increase timeout during extractor tests
cloudflare's 522 response takes longer than 30 seconds
2018-08-10 16:51:05 +02:00
Mike Fährmann
b55e39d1ee [mangadex] improve extraction
- cache manga API results
- add artist, author and date fields to chapter metadata
- remove Manga-/ChapterExtractor inheritance
- minor code simplifications and improvements
2018-08-10 16:50:07 +02:00
Mike Fährmann
2a9f3341a2 [behance] fix title extraction 2018-08-08 10:48:58 +02:00
Mike Fährmann
a86f2bfc80 [pinterest] update not-found redirects 2018-08-07 12:13:19 +02:00
Mike Fährmann
7442d2940c release version 1.5.0 2018-08-03 17:50:27 +02:00
Mike Fährmann
b040ca0718 [rule34] small unit test fixes 2018-08-03 17:28:47 +02:00
Mike Fährmann
f3793660ef update tests 2018-08-02 14:57:28 +02:00
Mike Fährmann
42a346413b fix "re:" prefix for keyword tests 2018-08-02 14:48:51 +02:00
Mike Fährmann
e0dd8dff5f implement L<maxlen>/<replacement>/ format option
The L option allows for the contents of a format field to be replaced
with <replacement> if its length is greater than <maxlen>.

Example:
{f:L5/too long/} -> "foo"      (if "f" is "foo")
                 -> "too long" (if "f" is "foobar")

(#92) (#94)
2018-07-29 13:52:07 +02:00
Mike Fährmann
bb89a1e6d7 [mangahere] use http://
invalid SSL cert for quite some time now
2018-07-26 18:11:31 +02:00
Mike Fährmann
ce34d82cb4 fix skipping tests on 5xx status codes 2018-07-19 18:47:23 +02:00
Mike Fährmann
a6fe2bb594 [whatisthisimnotgoodwithcomputers] remove extractor 2018-07-14 09:53:16 +02:00
Mike Fährmann
0ba93650e0 [8chan] replace unit test URL
the other thread is no longer accessible
2018-07-14 09:53:16 +02:00
Mike Fährmann
8fe9056b16 implement string slicing for format strings
It is now possible to slice string (or list) values of format string
replacement fields with the same syntax as in regular Python code.

"{digits}"       -> "0123456789"
"{digits[2:-2]}" -> "234567"
"{digits[:5]}"   -> "01234"

The optional third parameter (step) has been left out to simplify things.
2018-07-14 09:53:15 +02:00