Commit Graph

58 Commits

Author SHA1 Message Date
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
51ea699083 add 'abort()' as function to filter expressions
calling 'abort()' in a filter aborts the current extractor run
in a cleaner way than using something like 1/0, which
causes an error message to be printed
2018-04-12 17:07:12 +02:00
Mike Fährmann
3f2dd6b6f8 avoid double path-separators
(#74)
2018-03-22 10:24:59 +01:00
Mike Fährmann
b69cc94f0e [util] implement bencode() 2018-03-14 13:17:34 +01:00
Mike Fährmann
749fbbfa6c [mangadex] add chapter- and manga-extractor 2018-03-05 18:37:21 +01:00
Mike Fährmann
2fad0b1f1b add 'U' conversion for format strings to unquote their content
(#74)
2018-02-25 21:57:59 +01:00
Mike Fährmann
8cdce21dcb make archive keys user-configurable 2018-02-25 21:57:01 +01:00
Mike Fährmann
e1e0668ca8 add option to set default replacement field value
Missing or undefined keywords will now be replaced with the value
set for 'keywords-default'. The default is Python's 'None', which
is equivalent to setting this option to JSON's 'null'.
2018-02-23 00:59:20 +01:00
Mike Fährmann
ac3da8115e [util] don't add text: URLs to list of downloaded URLs 2018-02-20 18:14:27 +01:00
Mike Fährmann
b50bdbf3d7 change config specifiers in input file format
Instead of a dictionary/object, input file options are now specified
by a 'key=value' pair starting with '-' for options only applying to
the next URL or '-G' for Global options applying to all following URLs.

See the docstring of parse_inputfile() for details.

Example option specifiers:

- filename = "{id}.{extension}"
- extractor.pixiv.user.directory = ["Pixiv Users", "{user[id]}"]
-spaces="are_optional"
-G keywords = {"global": "option"}
2018-02-16 03:10:41 +01:00
Mike Fährmann
f970a8f13c fix adding keys to download archive when using skip=false 2018-02-13 23:45:30 +01:00
Mike Fährmann
179bcdd349 adjust archive-ids 2018-02-13 04:50:45 +01:00
Mike Fährmann
3cec533c28 Merge branch 'archive' 2018-02-12 18:07:58 +01:00
Mike Fährmann
b73b8b4f50 add OAuth unittests 2018-02-12 17:07:07 +01:00
Mike Fährmann
4d2fadfb6f restore skip actions with download archive 2018-02-12 16:56:45 +01:00
Mike Fährmann
65773263fc [util] implement OAuthSession.urlencode() (closes #75)
- Python's own urllib.parse.urlencode() has no quote_via argument in
  Python 3.3 and 3.4, which is necessary to follow  OAuth 1.0 quoting
  rules.
2018-02-10 21:56:13 +01:00
Mike Fährmann
057668e17e extend input-file format with per-URL config and comments
- see docstring of parse_inputfile() for details
- TODO: unittests, recursion (currently setting for example
  {"extractor": {"key": "value"}} will override the whole "extractor"
  branch instead of merging {"key": "value"} into the already existing
  dictionary)
2018-02-07 21:47:27 +01:00
Mike Fährmann
347baf7ac5 improve util.parse_range() performance
It is never going to actually matter, but using partition() instead
of split() is twice as fast.
2018-02-05 22:28:11 +01:00
Mike Fährmann
aa38eab2be allow not-defined fields in format strings
... and replace them with "None", for now
2018-02-03 22:28:41 +01:00
Mike Fährmann
84a52a9256 add DownloadArchive class 2018-01-30 15:23:23 +01:00
Mike Fährmann
db7f04dd97 emit log messages on download failure
and when retrying with fallback URLs
2018-01-28 18:44:10 +01:00
Mike Fährmann
6174a5c4ef [download] adjust filename extension on filetype mismatch
(closes #63)
2018-01-17 18:37:06 +01:00
Mike Fährmann
f10ffc0839 update extractor blacklist to also allow classes 2018-01-14 18:47:22 +01:00
Mike Fährmann
29d75fc3fa [tumblr] add support for OAuth authentication (#65) 2018-01-11 14:11:37 +01:00
Mike Fährmann
d241a0fb60 [util] replace '/' with '\' in base-directory paths
... on Windows to have consistent path separators.
2017-12-21 21:56:24 +01:00
Mike Fährmann
93482a1f88 implement 'util.advance()' 2017-12-03 01:38:24 +01:00
Mike Fährmann
a718c6c6cd implement 'util.parse_bytes()' 2017-12-02 01:24:49 +01:00
Mike Fährmann
caf26412dd add option to set alternate location of .part files (#29)
Note: The path set for 'downloader.*.part-directory' needs to point to an
already existing directory.
2017-10-26 00:16:48 +02:00
Mike Fährmann
ea8ca4cfa4 add 'util.expand_path()' 2017-10-26 00:04:28 +02:00
Mike Fährmann
963670d73b add options to control usage of .part files (#29)
- '--no-part' command line option to disable them
- 'downloader.http.part' and 'downloader.text.part' config options

Disabling .part files restores the behaviour of the old downloader
implementation.
2017-10-24 23:33:44 +02:00
Mike Fährmann
b0353aa02d rewrite download modules (#29)
- use '.part' files during file-download
- implement continuation of incomplete downloads
- check if file size matches the one reported by server
2017-10-24 12:53:03 +02:00
Mike Fährmann
832b8b76ac [util] extend global namespace for filter expressions 2017-10-09 22:12:58 +02:00
Mike Fährmann
8e6a767109 [util] restructure formatter for better exception propagation 2017-10-06 17:10:35 +02:00
Mike Fährmann
8df023e144 [util:filter] re-enable builtins
Trying to restrict access to Python's builtin functions (exec,
print, __import__, ...) can easily be circumvented and is
therefore completely pointless.

This also adds 'safe_int()' and the 'datetime' module to the global
namespace used when evaluating filter expressions.
2017-10-04 16:00:12 +02:00
Mike Fährmann
b319f4bab3 smaller code and text changes 2017-10-01 18:23:40 +02:00
Mike Fährmann
c1f0afe4c6 add custom string formatter class 2017-09-28 17:12:39 +02:00
Mike Fährmann
9fc1d0c901 implement and use 'util.safe_int()'
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
2017-09-24 15:59:25 +02:00
Mike Fährmann
9b21d3f13c add '--filter' command-line option
This allows for image filtering via Python expressions by the same
metadata that is also used to build filenames (--list-keywords).

The usually shunned eval() function is used to evaluate
filter-expressions, but it seemed quite appropriate in this case and
shouldn't introduce any new security issues, as any attacker that could do
> gallery-dl --filter "delete-everything()" ...
could as well do
> python -c "delete-everything()"
2017-09-08 17:52:00 +02:00
Mike Fährmann
268cfa3cfe filter duplicate URLs (#36)
Duplicate URLs might occur if, for example,  an artist adds another
image to his gallery while an extractor is running and images are being
downloaded on sites like pixiv/nijie/hentaifoundry.
The next image on the next page will have already been downloaded and
will cause a premature end if '--abort-on-skip' is being used.
2017-09-06 17:08:50 +02:00
Mike Fährmann
9bf9d64ad8 update unittests for util.py 2017-08-13 14:31:22 +02:00
Mike Fährmann
e3bfb8325a fix circular dependency
- util.py imported config.py and vice versa
- Python < 3.5 doesn't like this
2017-08-12 21:32:24 +02:00
Mike Fährmann
004456d5d5 properly update the config-dictionary
When using 2 or more config files, the values of the second would
improperly overwrite nested dictionaries of the first one.
The new method properly combines these nested dictionaries as well.
2017-08-12 20:07:27 +02:00
Mike Fährmann
ae2d61e5b3 handle format string exceptions separately 2017-08-11 21:48:37 +02:00
Mike Fährmann
d74a635e41 [util] update 'default' values and improve test coverage
for 'code_to_language()' and 'language_to_code()'
2017-08-08 19:22:04 +02:00
rachmadani haryono
dcd573806e chg: dev: fix error (#32)
* fix: dev: error

* fix: dev: AttributeError when getting artist

* fix: dev: typo on luscious parser
2017-08-04 15:01:10 +02:00
Mike Fährmann
0610ae5000 skip login if cookies are present 2017-07-17 10:33:36 +02:00
Mike Fährmann
2993206c4b smaller fixes and "security" measures
- move the OAuthSession class into util.py
- block special extractors for reddit and recursive
- ignore 'only matching' tests for testresults script
2017-06-16 21:01:40 +02:00
Mike Fährmann
72f1c6f87a [flickr] add support for flic.kr/p/... URLs
Example:
    https://flic.kr/p/FPVo9U
2017-06-02 09:01:35 +02:00
Mike Fährmann
107d29ad8a improve handling of text:... URLs
- don't require // after the colon
- open output files in text mode
2017-05-12 14:10:25 +02:00
Mike Fährmann
ef90a2de2f implement the "exit" option for the "skip" config-key 2017-05-05 15:49:58 +02:00