Commit Graph

325 Commits

Author SHA1 Message Date
Mike Fährmann
b1bea8aaeb add 'restrict-filenames' option (#348) 2019-07-23 17:41:24 +02:00
Mike Fährmann
b3851e01d9 release version 1.9.0 2019-07-19 21:55:25 +02:00
Mike Fährmann
12da6bd0c9 [simplyhentai] fix/improve extraction 2019-07-06 20:25:53 +02:00
Mike Fährmann
b89f0d8d3c update extractor result tests 2019-07-01 20:02:47 +02:00
Mike Fährmann
40da44b17f Merge branch 'v1.9.0' 2019-06-29 15:39:52 +02:00
Mike Fährmann
7a99e85943 [kissmanga] fix download URLs and file extensions
The current Blogspot image URLs hosted on Kissmanga end with an
"invalid" query parameter (/000.png&upx=...), which doesn't get
recognized by 'spliturl()' and 'parseurl()' as such and gets therefore
included in the 'extension' field from 'text.nameext_from_url()'.
2019-06-28 20:34:43 +02:00
Mike Fährmann
a9c89085fb [instagram] implement login support (#195) 2019-06-26 23:58:47 +02:00
Mike Fährmann
b1985d6579 test default format strings during extractor result tests
A missing value or an invalid "syntax" for a format replacement field
will raise an exception.
2019-06-25 18:12:32 +02:00
Mike Fährmann
95b1e4c3c0 implement R<old>/<new>/ format option (#318) 2019-06-23 22:45:44 +02:00
Mike Fährmann
70713f0f28 fix extractor result tests 2019-06-20 18:12:36 +02:00
Mike Fährmann
ee4d7c3d89 update downloader.find() and related code
Instead of replacing 'https' with 'http' for every URL in
'get_downloader()', this now only happens once during downloader
initialization. Also unit tests.
2019-06-20 16:59:44 +02:00
Mike Fährmann
179d112083 [downloader] overhaul http and text modules
Get rid of the modular structure and simplify/specialize those modules.
2019-06-19 22:56:11 +02:00
Mike Fährmann
a77340c647 [keenspot] fix extraction for "TwoKinds" 2019-06-17 19:49:39 +02:00
Mike Fährmann
b171befa87 implement 'parse_unicode_escapes()' 2019-06-16 21:47:24 +02:00
Mike Fährmann
e05a96db5e [deviantart] rename 'stash' to 'extra' (#302)
'stash' is already used as a name for the StashExtractor and therefore
expected to be a dictionary.
2019-06-10 21:05:25 +02:00
Mike Fährmann
7c6cb908f9 [xhamster] update test results 2019-06-07 16:28:49 +02:00
Mike Fährmann
62335b9015 [paheal] adjust test results 2019-06-05 11:42:01 +02:00
Mike Fährmann
6a34f4b0c1 skip tests on read timeouts; print list of skipped tests 2019-06-01 20:47:31 +02:00
Mike Fährmann
d33f5a7423 [wallhaven] rewrite
- use API
- remove login support, add 'api-key' option
- remove support for "alpha" subdomain - alpha.wallhaven.cc used numeric
  IDs that can't be translated to the new ID system
- support direct links to wallpapers
2019-05-31 14:53:02 +02:00
Mike Fährmann
5499934ae2 [ngomik] fix extraction 2019-05-30 20:18:36 +02:00
Mike Fährmann
2b1999476e implement 'text.rextract()' 2019-05-28 21:03:41 +02:00
Mike Fährmann
e30ada162d fix cookie tests
update _get_extractor():
- always return an Extractor instance with a _login_impl() method
- use Extractor.from_url()
2019-05-26 20:22:04 +02:00
Mike Fährmann
2316e0ed3d fix strptime workaround from b0e85a4
Don't return a modified version of 'date_time' if strptime fails.
2019-05-25 23:22:26 +02:00
Mike Fährmann
6764847349 fix cookie tests
'cookies' is a CookieJar, not a dict,
and removing the call to '.keys()' doesn't have the same effect
2019-05-14 22:32:40 +02:00
Mike Fährmann
a5b060765d improve code in tests
- use 'assertRaises' as context manager
- remove calls to .keys()
2019-05-13 11:48:20 +02:00
Mike Fährmann
b0e85a42e3 apply workaround from 4736912 in parse_datetime() itself 2019-05-09 21:53:17 +02:00
Mike Fährmann
4736912d4e [pixiv] work around strptime limitations in Python < 3.7
"%z" doesn't allow a colon separator in older Python versions:
    - "+0900" is OK
    - "+09:00" raises an exception
2019-05-08 18:08:03 +02:00
Mike Fährmann
d09864b581 implement text.parse_datetime() 2019-05-08 15:43:59 +02:00
Mike Fährmann
5582b06ae4 fix tests with 'urllist' messages 2019-04-30 16:31:48 +02:00
Mike Fährmann
5018781898 allow type tests by name 2019-04-29 17:27:59 +02:00
Mike Fährmann
6264a46212 use 'utcfromtimestamp()'
'fromtimestamp()' converts its results to the local timezone and causes
problems when running tests on a different machine.
2019-04-21 16:22:53 +02:00
Mike Fährmann
d670de0344 implement 'text.parse_timestamp()' 2019-04-21 15:28:27 +02:00
Mike Fährmann
21a7e395a7 implement convenience wrapper for text.extract functionality 2019-04-19 22:30:11 +02:00
Mike Fährmann
e25ebc4bff don't disable certificate checks anymore
Executables generated with PyInstaller auto-include the root certificate
file and certificate checks now work out-of-the-box.
2019-04-17 13:27:19 +02:00
Mike Fährmann
d6ddb74cde update test results
- deviantart: 'index' is now an integer
- flickr: image file with lower quality
- paheal: image server name changed
- rule34: post got deleted
2019-04-12 09:59:48 +02:00
Mike Fährmann
d9b94a585d [mangoxo] add login support (#184)
A very recent change: It is now only possible to see more
than the first 5 images of an album if you are logged in.
2019-04-10 18:55:25 +02:00
Mike Fährmann
e730fc9045 [twitter] add login support (#214) 2019-04-09 09:27:49 +02:00
Mike Fährmann
790f15a56f [photobucket] use HTTPS 2019-04-03 18:30:45 +02:00
Mike Fährmann
c70b21248d [wikiart] add extractors (#179)
for
- artists:          https://www.wikiart.org/en/thomas-cole
- artist-listings:  https://www.wikiart.org/en/artists-by-century/12
- artwork-listings: https://www.wikiart.org/en/paintings-by-media/grisaille
2019-04-02 17:34:57 +02:00
Mike Fährmann
0c991a3155 add convenience targets to Makefile 2019-03-29 15:35:00 +01:00
Mike Fährmann
6277a739e4 [35photo] add user-, genre-, and image-extractors (#162) 2019-03-18 01:11:30 +01:00
Mike Fährmann
973a720a7a [weibo] fix unit test URL patterns 2019-03-15 15:19:39 +01:00
Mike Fährmann
6f57d44ec2 [seaotterscans] remove extractor
http://seaotterscans.com/ now redirects to their MangaDex profile
2019-03-13 22:02:45 +01:00
Mike Fährmann
0887fb61f4 [komikcast] update test results 2019-03-07 14:55:52 +01:00
Mike Fährmann
a881537b91 more util.py tests 2019-03-06 21:09:37 +01:00
Mike Fährmann
976ccb267f [myportfolio] combine gallery and user extractors
An URL alone isn't good enough to distinguish between a gallery or a
gallery-listing, so the new extractor decides what to do based on the
page's content.
2019-03-06 19:45:01 +01:00
Mike Fährmann
9c0e2f294b [shopify] add generic collection and product extractors (#175)
with fashionnova.com  as a default domain
2019-03-05 22:33:37 +01:00
Mike Fährmann
176b7253a1 update function signature for config.load() 2019-03-01 14:13:34 +01:00
Mike Fährmann
e687a6095e [luscious] raise exception if album is not available 2019-02-19 13:30:39 +01:00
Mike Fährmann
b09a8184ca move TestJob into test module; test _extractor values 2019-02-17 18:18:31 +01:00