Commit Graph

1168 Commits

Author SHA1 Message Date
Mike Fährmann
69a5e6ddb3 Merge branch 'master' into 1.4-dev 2018-05-04 10:19:02 +02:00
Mike Fährmann
82c50fa609 release version 1.3.5 2018-05-04 10:03:20 +02:00
Mike Fährmann
3ce5296313 [smugmug] code cleanup
- combine User and Node extractors
- (re)move miscellaneous helper functions
- rename "Owner" to "User"
2018-05-03 14:12:10 +02:00
Mike Fährmann
42ed7667b8 [smugmug] support user- and general album URLs 2018-05-02 20:34:45 +02:00
Mike Fährmann
8bf3cdd82b implement logging options
Standard logging to stderr, logfiles, and unsupported URL files (which
are now handled through the logging module) can now be configured by
setting their respective option keys (log, logfile, unsupportedfile)
to a dict and specifying the following options;

- format:
    format string for logging messages
    available keys: see [1]
    default: "[{name}][{levelname}] {message}"
- format-date:
    format string for {asctime} fields in logging messages
    available keys: see [2]
    default: "%Y-%m-%d %H:%M:%S"
- level:
    the lowercase levelname until which the logger should activate;
    available levels are debug, info, warning, error, exception
    default: "info"
- path:
    path of the file to be written to
- mode:
    'mode' argument when opening the specified file
    can be either "w" to truncate the file or "a" to append to it (see [3])

If 'output.log', '.logfile', or '.unsupportedfile' is a string, it will
be interpreted, as it has been, as the filepath
(or as format string for .log)

[1] https://docs.python.org/3/library/logging.html#logrecord-attributes
[2] https://docs.python.org/3/library/time.html#time.strftime
[3] https://docs.python.org/3/library/functions.html#open
2018-05-01 17:54:52 +02:00
Mike Fährmann
2ea0d1da42 [smugmug] improve API code; use data expansions 2018-04-30 18:22:44 +02:00
Mike Fährmann
3fe653d940 fix test_results for empty sets
{} is an empty dict and doesn't support set operations
2018-04-29 22:43:37 +02:00
Mike Fährmann
16e014baaa [smugmug] added image and album extractor
just some initial code that still requires a lot of work ...

TODO:
- folders
- old-style albums (which are nearly all of them ...)
- images from users
- OAuth

It could also happen that the API credentials used will become invalid
whenever my 14 day trial period ends (7 days remaining), but that
would just require users to supply their own.
2018-04-29 21:27:25 +02:00
Mike Fährmann
d96b3474e5 [puremashiro] remove module
site has been unreachable for a couple of weeks
and now the DNS record is gone as well
2018-04-28 14:24:20 +02:00
Mike Fährmann
b44a296404 [gomanga] remove module
site has been unreachable for a couple of weeks
and the cloudflare status page shows host errors
2018-04-28 14:24:21 +02:00
Mike Fährmann
95392554ee use text.urljoin() 2018-04-26 17:00:26 +02:00
Mike Fährmann
2395d870dd [pinterest] unquote board and user names, better errors 2018-04-26 16:38:12 +02:00
Mike Fährmann
8b79eaafea [tumblr] log actual time of rate limit resets
... instead of the amount of seconds until a reset
2018-04-25 16:13:03 +02:00
Mike Fährmann
0f1e07f627 [pinterest] scrap OAuth implementation; code improvements
OAuth authentication isn't needed anymore and other tools
like Postman are better suited for this job anyway.
2018-04-25 16:04:30 +02:00
Mike Fährmann
55d4d23860 [pinterest] use Pinterest's "Web" API (#83)
no access tokens, no user credentials of any kind ...
2018-04-24 22:28:10 +02:00
Mike Fährmann
2721417dd8 Merge branch 'master' into 1.4-dev 2018-04-24 11:33:02 +02:00
Mike Fährmann
c6d5154fc3 fix flake8 errors, ignore W504
pycodestyle 2.4.0 enforces some new style guidelines
2018-04-24 11:25:32 +02:00
Mike Fährmann
2d17a9e07f improve extractor.request()
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
2018-04-23 18:45:59 +02:00
Mike Fährmann
80521ae1f6 [deviantart] improve API error handling
The previous implementation would retry requests with 4xx status codes
in an infinite loop, which is especially a problem when querying
non-existent users or groups. These are now properly handled with a
NotFoundError exception.
2018-04-23 10:10:43 +02:00
Mike Fährmann
e54b43be08 [mangadex] add title info for chapter extractors 2018-04-22 16:20:04 +02:00
Mike Fährmann
f471161920 Merge branch 'master' into 1.4-dev 2018-04-21 12:15:40 +02:00
Mike Fährmann
a2020c736e release version 1.3.4 2018-04-20 18:42:09 +02:00
Mike Fährmann
eb37fbf0e8 [hentaifoundry] improve extractor
- use common base class
- better pagination
- respect '.../page/<num>'
- implement skip() / --range support
- get YII_CSRF_TOKEN from cookies
2018-04-20 18:26:23 +02:00
Mike Fährmann
80bead739d [oauth] require custom client-* values for pinterest 2018-04-20 15:31:05 +02:00
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
ff643793bd improve and document cloudflare bypass code 2018-04-19 21:32:10 +02:00
Mike Fährmann
10cc59f3b5 fix extractor names 2018-04-18 18:12:57 +02:00
Mike Fährmann
b1325d4d2c fix extractor docstrings 2018-04-18 18:03:43 +02:00
Mike Fährmann
df7e18399e [luscious] fix image order 2018-04-17 17:32:21 +02:00
Mike Fährmann
d10579edb5 [pinterest] improve PinterestAPI code; remove OAuth mentions
on another note: access_tokens have been set to only allow for
10 requests per hour (from 200 yesterday)
2018-04-17 17:12:42 +02:00
Mike Fährmann
4bd182c107 [pinterest] implement oauth:pinterest (#83)
Pinterest access tokens are rate limited at 200 requests per
hour (or maybe per 2 or 3 hours?) so having just one access token
for all users isn't going to work in the long run.
2018-04-16 20:03:28 +02:00
Mike Fährmann
9651f3fce0 [pinterest] improve error messages (#83) 2018-04-16 19:36:54 +02:00
Mike Fährmann
dbe250f7e5 [pinterest] update access_token (#83) 2018-04-16 09:46:45 +02:00
Mike Fährmann
dd49127408 [spectrumnexus] remove module
Site stopped hosting manga scans (http://view.thespectrum.net/)
2018-04-16 09:45:07 +02:00
Mike Fährmann
5c487300ee improve 'parse_query()' and add tests
- another irrelevant micro-optimization !
- use urllib.parse.parse_qsl directly instead of parse_qs, which
  just packs the results of parse_qsl in a different data structure
- reduced memory requirements since no additional dict and lists are
  created
2018-04-15 19:05:29 +02:00
Mike Fährmann
728c64a3fb [tumblr] rename 'offset' to 'num and adjust formats
Trying to somehow emulate Tumblr filenames is a bad idea ...
2018-04-15 18:58:32 +02:00
Mike Fährmann
4ffa94f634 remove 'shorten_path()' and 'shorten_filename()' 2018-04-15 18:44:13 +02:00
Mike Fährmann
27eab4e467 rewrite text tests and improve functions
- test more edge cases
- consistently return an empty string for invalid arguments
- remove the ungreedy-flag in 'remove_html()'
2018-04-15 18:13:46 +02:00
Mike Fährmann
e3f2bd4087 add tests for 'text.clean_xml()' and improve it 2018-04-14 22:07:01 +02:00
Mike Fährmann
6d8b191ea7 improve 'parse_query()' and add tests
- another irrelevant micro-optimization !
- use urllib.parse.parse_qsl directly instead of parse_qs, which
  just packs the results of parse_qsl in a different data structure
- reduced memory requirements since no additional dict and lists are
  created
2018-04-13 19:21:32 +02:00
Mike Fährmann
51ea699083 add 'abort()' as function to filter expressions
calling 'abort()' in a filter aborts the current extractor run
in a cleaner way than using something like 1/0, which
causes an error message to be printed
2018-04-12 17:07:12 +02:00
Mike Fährmann
6bd857a319 [tumblr] handle rate limits / 429 errors
- wait for the hourly limit to reset
- abort upon exceeding the daily limit (it doesn't seem useful to
  potentially wait for several hours)
2018-04-12 16:25:20 +02:00
Mike Fährmann
7073ab7707 [komikcast] update regex to only match manga pages
The 'readerarea' section now includes some (shady) external
Javascript file, which got matched as well.
2018-04-11 15:48:17 +02:00
Mike Fährmann
a1fa4b43b0 Revert "[tumblr] add option to sort photosets by upload order"
This reverts commit 4a26ae32df.
2018-04-09 16:08:08 +02:00
Mike Fährmann
48a83a89e9 [loveisover] remove module
archive.loveisover.me was shut down on 2018-03-29;
https://www.archiveteam.org/index.php?title=4chan#archive.loveisover.me
2018-04-09 16:05:15 +02:00
Mike Fährmann
564e12ca8f replace 'imgyt' with 'imxto'
https://img.yt/ wasn't available for a couple of days, but has now
re-emerged as https://imx.to/ with a new web-interface.
Links to older images still work (see tests).
2018-04-09 15:53:20 +02:00
Mike Fährmann
1b80fa82a9 [imgur] update URL pattern and tests 2018-04-08 21:06:21 +02:00
Mike Fährmann
4a26ae32df [tumblr] add option to sort photosets by upload order 2018-04-07 15:57:55 +02:00
Mike Fährmann
6b72be8ee6 [tumblr] add 'hash' keyword
'hash' is the middle part of the filename in a tumblr image URL.
For example an image with '.../tumblr_p6tgemp1NZ1wgha4yo1_250.png' as
its URL would have 'p6tgemp1NZ1wgha4yo1' as hash.
2018-04-07 15:54:30 +02:00
Mike Fährmann
ffc0c67701 release version 1.3.3 2018-04-06 15:45:45 +02:00