Commit Graph

1844 Commits

Author SHA1 Message Date
Leonardo Taccari
afce1ee1eb Avoid possible sensitive information disclosure via cache.file
Previously cache.file could be created world readable leading to
possible sensitive information disclosure on multi-user systems.
Restrict permissions only to the owner by creating an empty file.

Please note that cache.file created before this commit may need a
`chmod 600' or similar!
2019-07-31 15:05:26 +02:00
Mike Fährmann
2153206093 [imgbb] add album extractor (#361) 2019-07-30 23:11:19 +02:00
Mike Fährmann
beb4fab2e6 [exhentai] improve limit and error handling (#360)
- check image limit before opening the first gallery or image page
- prevent any further exhentai extractors from running after the image
  limit has been reached
2019-07-30 22:58:35 +02:00
Mark Henrick
923e1bb714 [docs] Fix inconsistency about which sites have optional authentication (#359)
* [docs] Fix inconsistency about which sites have optional authentication

* update authentication docs
2019-07-29 18:22:31 +02:00
Mike Fährmann
81b35ed3cb [exhentai] catch more error states (#356, #360)
- warn on MPV-enabled galleries
- catch parsing errors for gallery pages and image info
- write page content to debug output
2019-07-29 16:54:31 +02:00
Mike Fährmann
a90280f4e7 [postprocessor:zip] add 'mode' option (#355) 2019-07-29 16:51:26 +02:00
Mike Fährmann
6ce22f606b [exhentai] update login procedure and tests
Logging in now follows the natural login flow that also happens in a
browser more closely and collects more cookies than just ipb_member_id
and ipb_pass_hash.

Test URLs have been updated and now point to the e-hentai.org domain.
2019-07-28 16:51:05 +02:00
Mike Fährmann
dc73d02d87 [exhentai] always use e-hentai.org as domain + set nw cookie 2019-07-28 10:54:17 +02:00
Mike Fährmann
40637556fa [ngomik] fix extraction 2019-07-28 10:53:46 +02:00
Mike Fährmann
3969f9cbbd [behance] fix collection extraction 2019-07-27 14:26:40 +02:00
Mike Fährmann
20f7b07312 ensure postproc finalize() is called during C-c or crash (#355) 2019-07-27 11:14:52 +02:00
Mike Fährmann
17a3426845 [gelbooru] enable all content when not using API 2019-07-27 11:13:38 +02:00
Mike Fährmann
279db2c5b2 [vsco] add collection & image extractor + video support (#331) 2019-07-26 19:06:15 +02:00
Mike Fährmann
547ea71463 [downloader.ytdl] add 'forward-cookies' option (#352)
The "long" name is necessary because just calling it 'cookies' would
clash with how the lookup for '--cookies' is implemented.
2019-07-24 21:19:11 +02:00
Mike Fährmann
d9d44ad953 [tsumino] update test results 2019-07-24 21:17:23 +02:00
Mike Fährmann
b1bea8aaeb add 'restrict-filenames' option (#348) 2019-07-23 17:41:24 +02:00
Mike Fährmann
60cf40380a [vsco] add user extractor (#331) 2019-07-23 16:23:11 +02:00
Mike Fährmann
3fe5ccdfa6 [adultempire] add gallery extractor (closes #340) 2019-07-21 22:29:57 +02:00
Mike Fährmann
b3851e01d9 release version 1.9.0 2019-07-19 21:55:25 +02:00
Mike Fährmann
5d968412ca [deviantart] case-insensitive folder name matching (fixes #343) 2019-07-19 18:05:31 +02:00
Mike Fährmann
a3c736fedc [500px] fix extraction
Maximum available image dimensions have been reduced to 4096px
on the longest edge. (from 5000px)
A few (unimportant) metadata fields are no longer available or have
been changed to 'null'.
2019-07-19 17:23:03 +02:00
Mike Fährmann
1133b7fcbd [smugmug] update unit tests
The account used for tests before has been deleted.
2019-07-19 17:16:24 +02:00
Mike Fährmann
21991acc49 add 'ciphers' option; update default User-Agent 2019-07-19 17:14:40 +02:00
Mike Fährmann
84f4d3bc0b replace urllib3's default cipher list with Firefox's (#342)
Avoids Cloudflare CAPTCHAs on both Linux in Windows without
pyOpenSSL installed.
2019-07-18 19:42:13 +02:00
Mike Fährmann
feb98cf196 [twitter] improve 'content' formatting; add option (#338)
- include emoticons
- leave newlines intact
- remove pic.twitter.com/ links at the end
2019-07-17 16:02:51 +02:00
Mike Fährmann
1740086d8a add 'repl' and 'sep' arguments to text.replace_html() 2019-07-17 14:48:24 +02:00
Mike Fährmann
8d1ae9b715 [tumblr] enable date-min/-max/-format options (#337) 2019-07-17 14:36:41 +02:00
Mike Fährmann
09f37fde39 [reddit] move date-min/-max handling into Extractor class 2019-07-16 22:54:39 +02:00
Mike Fährmann
fb875d1ab8 add warning about NSFW sites in supportedsites.rst (#335) 2019-07-15 21:44:34 +02:00
Mike Fährmann
7b77ecc35a fix paths for files without extension (#220) 2019-07-15 16:39:03 +02:00
Mike Fährmann
c41ff9441e improve find() for downloaders and postprocessors 2019-07-15 16:33:03 +02:00
Mike Fährmann
0151e250f5 [twitter] extract 'content' metadata (closes #333) 2019-07-15 16:25:22 +02:00
Mike Fährmann
16c582aaf9 implement 'mtime' post-processor (#332)
This can set a file's modification time according to a UNIX timestamp
or a datetime object from its metadata.
2019-07-14 22:39:17 +02:00
Mike Fährmann
62097284fe add 'download' option (#220) 2019-07-14 18:48:18 +02:00
Mike Fährmann
fe7805de7c improve attribute access in DownloadJob.handle_url()
Storing a value in a local variable an accessing it that way is faster
than going through 'self' if it is accessed more than once.
2019-07-13 21:42:07 +02:00
Mike Fährmann
56c7a66a4a detect Cloudflare CAPTCHAs and update cipher list 2019-07-10 15:18:20 +02:00
Mike Fährmann
a7b42b37a2 [35photo] fix extraction 2019-07-09 20:33:57 +02:00
Mike Fährmann
04b8d0894a [newgrounds] improve metadata extraction 2019-07-08 17:53:55 +02:00
Mike Fährmann
12da6bd0c9 [simplyhentai] fix/improve extraction 2019-07-06 20:25:53 +02:00
Mike Fährmann
fdec59f8e2 replace extractor.request() 'expect' argument
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
2019-07-05 00:42:16 +02:00
Mike Fährmann
2ff73873f0 [erolord] add gallery extractor (closes #326) 2019-07-04 20:28:04 +02:00
Mike Fährmann
b4da8c5a97 [sexcom] add extractor for related pins (#325) 2019-07-03 21:04:23 +02:00
Mike Fährmann
69997e92db [sexcom] skip unavailable pins (#325) 2019-07-02 22:05:54 +02:00
Mike Fährmann
8966930c5c [downloader:http] try to import SSL exception class from OpenSSL
(#324)
2019-07-01 20:10:26 +02:00
Mike Fährmann
bc6b0cfddc [shopify] skip consecutive duplicate products
Not filtering duplicate URLs anymore caused the archive ID uniqueness
test to fail.
2019-07-01 20:04:57 +02:00
Mike Fährmann
b89f0d8d3c update extractor result tests 2019-07-01 20:02:47 +02:00
Mike Fährmann
69205df68d allow '-1' for infinite retries (#300) 2019-06-30 23:10:47 +02:00
Mike Fährmann
f7b5c4c3e7 use values of 'retries' options correctly
The RE-tries option now specifies exactly that: the maximum number a
failed HTTP request is re-tried. For example a value of 2 will now
correctly stop after 3 attempts: the initial one + 2 re-tries.

The maximum wait-time now also caps at 30min and increases exponentially
for both extractor.request() and downloader.http.download().
2019-06-30 23:10:18 +02:00
Mike Fährmann
6393b47db2 add '-A/--abort'; deprecate '--abort-on-skip' 2019-06-30 14:28:28 +02:00
Mike Fährmann
f2000a69aa implement 'image-unique' and 'chapter-unique' options (#303)
The default value for both is 'false', i.e. duplicate URLs are NOT
ignored.

The previous behavior was to always ignore duplicate URLs to make
'--abort-on-skip' work properly when new images where added to the
beginning of a collection while gallery-dl is running.
2019-06-29 22:50:17 +02:00