Commit Graph

7353 Commits

Author SHA1 Message Date
Mike Fährmann
e1e0668ca8 add option to set default replacement field value
Missing or undefined keywords will now be replaced with the value
set for 'keywords-default'. The default is Python's 'None', which
is equivalent to setting this option to JSON's 'null'.
2018-02-23 00:59:20 +01:00
Mike Fährmann
ac3da8115e [util] don't add text: URLs to list of downloaded URLs 2018-02-20 18:14:27 +01:00
Mike Fährmann
8704d850bf add explicit proxy support (#76)
- '--proxy' as command-line argument
- 'extractor.*.proxy' as config option
2018-02-19 18:45:06 +01:00
Mike Fährmann
89440382ad [tumblr] use separate API key for unit tests 2018-02-19 16:54:37 +01:00
Mike Fährmann
367b963d37 [pixiv] fix ugoira extraction ... again (#78)
Some animations are not available for mobile devices, so we
pretend to be a desktop browser when requesting the ugoira page.
2018-02-19 16:50:12 +01:00
Mike Fährmann
b79f1f2ca7 [pixiv] fix ugoira extraction (closes #78) 2018-02-19 08:51:09 +01:00
Mike Fährmann
731ffd4986 improve text.filename_from_url() performance
- urlsplit() is faster than urlparse()
- rpartition() is faster than rindex() + slicing
- new version is 2.3 times as fast
2018-02-18 16:50:07 +01:00
Mike Fährmann
d122203be1 [mangastream] fix extraction 2018-02-17 22:40:16 +01:00
Mike Fährmann
8809b32aed release version 1.2.0 2018-02-16 22:29:57 +01:00
Mike Fährmann
5864afc0d3 update CHANGELOG 2018-02-16 22:27:40 +01:00
Mike Fährmann
b50bdbf3d7 change config specifiers in input file format
Instead of a dictionary/object, input file options are now specified
by a 'key=value' pair starting with '-' for options only applying to
the next URL or '-G' for Global options applying to all following URLs.

See the docstring of parse_inputfile() for details.

Example option specifiers:

- filename = "{id}.{extension}"
- extractor.pixiv.user.directory = ["Pixiv Users", "{user[id]}"]
-spaces="are_optional"
-G keywords = {"global": "option"}
2018-02-16 03:10:41 +01:00
Mike Fährmann
f970a8f13c fix adding keys to download archive when using skip=false 2018-02-13 23:45:30 +01:00
Mike Fährmann
179bcdd349 adjust archive-ids 2018-02-13 04:50:45 +01:00
Mike Fährmann
be3ea4425d test archive-id creation and uniqueness 2018-02-12 23:02:09 +01:00
Mike Fährmann
3cec533c28 Merge branch 'archive' 2018-02-12 18:07:58 +01:00
Mike Fährmann
20af86b2ea add more extractor tests
for mangastream, reddit and imgur
2018-02-12 17:07:18 +01:00
Mike Fährmann
b73b8b4f50 add OAuth unittests 2018-02-12 17:07:07 +01:00
Mike Fährmann
4d2fadfb6f restore skip actions with download archive 2018-02-12 16:56:45 +01:00
Mike Fährmann
65773263fc [util] implement OAuthSession.urlencode() (closes #75)
- Python's own urllib.parse.urlencode() has no quote_via argument in
  Python 3.3 and 3.4, which is necessary to follow  OAuth 1.0 quoting
  rules.
2018-02-10 21:56:13 +01:00
Mike Fährmann
7e0207bcf4 [imgur] strip trailing '?1' from 'ext' 2018-02-10 21:33:40 +01:00
Mike Fährmann
cf147dfee9 [hentai2read] fix manga extraction
- site changed its HTML structure
2018-02-09 22:24:34 +01:00
Mike Fährmann
f5f2d29f56 [nijie] fix dojin extraction
- correctly extract artist_id
- set extension to "jpg" if it was empty and let filetype checks do
  the rest
2018-02-09 22:06:26 +01:00
Mike Fährmann
7f7c16ae37 add option to specify additional key-value pairs 2018-02-08 23:10:58 +01:00
Mike Fährmann
d38bf2f54c [tumblr] recognize /image/... URLs
xyz.tumblr.com/image/123 refers to the same images
as xyz.tumblr.com/post/123.
2018-02-08 23:08:14 +01:00
Mike Fährmann
057668e17e extend input-file format with per-URL config and comments
- see docstring of parse_inputfile() for details
- TODO: unittests, recursion (currently setting for example
  {"extractor": {"key": "value"}} will override the whole "extractor"
  branch instead of merging {"key": "value"} into the already existing
  dictionary)
2018-02-07 21:47:27 +01:00
Mike Fährmann
5b3c34aa96 use generic chapter-extractor in more modules 2018-02-07 12:36:39 +01:00
Mike Fährmann
347baf7ac5 improve util.parse_range() performance
It is never going to actually matter, but using partition() instead
of split() is twice as fast.
2018-02-05 22:28:11 +01:00
Mike Fährmann
7b5ba69951 [hentaihere] ensure consistent extraction results
sometimes there is a random space before the next <a>
2018-02-05 15:26:25 +01:00
Mike Fährmann
377b78b3c9 [hentai2read] fix manga name extraction 2018-02-04 22:12:24 +01:00
Mike Fährmann
54c36a8a34 [subapics] add chapter- and manga-extractor (#70) 2018-02-04 22:02:10 +01:00
Mike Fährmann
2dd3aeeeae [komikcast] add chapter- and manga-extractor (#70) 2018-02-04 22:02:10 +01:00
Mike Fährmann
7a412f5c32 implement generic manga-chapter extractor 2018-02-04 22:02:04 +01:00
Mike Fährmann
aa38eab2be allow not-defined fields in format strings
... and replace them with "None", for now
2018-02-03 22:28:41 +01:00
Mike Fährmann
6a07e38366 implement extractor.add() and .add_module()
... as a public and non-hacky way to add (external) extractors to
gallery-dl's pool and make them available for extractor.find()
2018-02-02 00:01:41 +01:00
Mike Fährmann
c0dd922c13 add '--download-archive' cmdline option
… as well as a config file equivalent
2018-02-01 22:00:44 +01:00
Mike Fährmann
8c3b713362 rework DownloadJob.handle_url(); include archive functionality
todo:
"abort" and "exit" skip modes if download is skipped because of archive
2018-02-01 20:49:41 +01:00
Mike Fährmann
34873dbd90 set 'archive_fmt' values
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
a34cebc253 [luscious] jump to first image if cover does not link to it 2018-01-30 22:39:01 +01:00
Mike Fährmann
84a52a9256 add DownloadArchive class 2018-01-30 15:23:23 +01:00
Mike Fährmann
915807dd77 log HTTP errors as warnings 2018-01-29 21:55:46 +01:00
Mike Fährmann
db7f04dd97 emit log messages on download failure
and when retrying with fallback URLs
2018-01-28 18:44:10 +01:00
Mike Fährmann
d951f13e37 add config option for unsupported-URL file
for consistency's sake
2018-01-28 18:42:10 +01:00
Mike Fährmann
619387cbb1 update extractor unittest results 2018-01-28 18:29:05 +01:00
Mike Fährmann
364e335440 smaller adjustments and improvements
- requests and urllib3 version on 1 line
- close input file after reading from it
- use expand_path for unsupported-urls file
- remove unnecessary logging from options.py
2018-01-27 01:05:17 +01:00
Mike Fährmann
c9a9664a65 change --write-log behaviour
- log files now get truncated when opening them
  (mode "w" instead of "a")
- log verbosity to file depends on -q/-v
  (same  as logging to stderr)
2018-01-27 00:51:40 +01:00
Mike Fährmann
97f4f15ec0 add option to write logging output to a file
- '--write-log FILE' as cmdline argument
- 'output.logfile' as config file option
2018-01-26 18:51:51 +01:00
Mike Fährmann
f94e3706a8 use logging module for error messages during downloads 2018-01-26 18:11:13 +01:00
Mike Fährmann
db91cf871c document message identifiers 2018-01-23 21:38:30 +01:00
Mike Fährmann
0dd48d644f update test results
nothing broke, but things got updated or changed
2018-01-23 21:38:29 +01:00
Mike Fährmann
1e93955170 [batoto] remove module
Site officially shut down on 2018.01.18
2018-01-23 21:37:32 +01:00