Commit Graph

165 Commits

Author SHA1 Message Date
Mike Fährmann
64cf26eaf4 allow specifying sleep-* options as string
either as single value or as range: "3.5", "2.1 - 5.0"
2021-12-18 23:28:56 +01:00
Mike Fährmann
010d65dcec extend blacklist/whitelist syntax (#2025)
Each entry in such a list can now also include a subcategory
'<category>:<subcategory>'
and it is possible to use '*' or an empty string as placeholder
'*:<subcategory>', ':<subcategory>', '<category>:*'

For example
  "blacklist": "imgur,*:tag,gfycat:user" or
  "blacklist": ["imgur", "*:tag", "gfycat:user"]
will filter all 'imgur' extractors, all extractors  with a 'tag'
subcategory (e.g. https://danbooru.donmai.us/posts?tags=bonocho),
and all 'gfycat' user extractors.
2021-11-23 20:31:43 +01:00
Mike Fährmann
c22ff97743 remove 'unit' argument from 'util.format_value()' 2021-09-28 23:07:55 +02:00
Mike Fährmann
cad85640de move 'util.PathFormat' into its own 'path' module
to prevent circular imports between 'formatter' and 'util'
2021-09-27 21:29:37 +02:00
Mike Fährmann
74145467dd move 'util.Formatter' into its own 'formatter' module 2021-09-27 02:37:04 +02:00
Mike Fährmann
e69ee41f25 implement 'page-reverse' option (#1854) 2021-09-23 18:02:19 +02:00
Mike Fährmann
c9e6693530 allow specifying a minimum/maximum for 'sleep-*' options (#1835)
for example '"sleep-request": [5.0, 10.0]' to wait between 5 and 10
seconds between each HTTP request
2021-09-14 17:40:05 +02:00
Mike Fährmann
292fffc83c add 'j' format string conversion
to convert to a JSON formatted string
2021-08-28 01:19:36 +02:00
Mike Fährmann
d3eab417ed implement a 'path-strip' option 2021-08-24 23:23:12 +02:00
Mike Fährmann
2792ed6e4b implement 'util.format_value()' 2021-07-26 02:11:22 +02:00
Mike Fährmann
13d4045a8a add 'archive-prefix' option (#1711) 2021-07-20 20:21:33 +02:00
Mike Fährmann
9e42cd58ea replace ChainPredicate class with 'functools.partial' 2021-07-20 20:21:32 +02:00
Mike Fährmann
1b2f9050fb rename all instances of 'kwds' to 'kwdict' 2021-07-20 20:21:19 +02:00
Mike Fährmann
0179581340 add 'T' format string conversion (#1646)
to convert 'date'/datetime to timestamp
2021-06-25 22:35:45 +02:00
Mike Fährmann
befe635022 cache parsed Formatter functions 2021-06-22 19:46:04 +02:00
Mike Fährmann
79b7ee2712 use 'functools.partial' in '_build_cleanfunc' when possible
makes calls to the returned function a slight bit faster (~10%)
2021-06-20 23:34:41 +02:00
Mike Fährmann
ceaf7fd989 optimize 'base-directory' initialization and usage
apply 'clean_path()' only once
2021-06-20 21:35:43 +02:00
Mike Fährmann
2ca011dfa8 add 'kwdict' argument to PathFormat.build_filename() 2021-06-20 20:26:38 +02:00
Mike Fährmann
fd00d47116 implement conditional directories (#1394)
They work the same way as conditional filenames (84d2e640), e.g.

"directory": {
    "score >= 20": ["high score"],
    "score >= 5" : ["mid score"],
    ""           : ["{category}", "default"]
}
2021-06-20 20:09:35 +02:00
Mike Fährmann
def0148582 restructure code in PathFormat constructor 2021-06-08 18:05:07 +02:00
Mike Fährmann
84d2e64024 combine conditional filenames into filename option (#1394) 2021-06-08 18:00:06 +02:00
Mike Fährmann
4cf40434d7 initial support for conditional filenames (#1394) 2021-06-04 16:45:32 +02:00
Mike Fährmann
0abad8bc12 implement 'compile_expression()' 2021-06-03 22:34:58 +02:00
Mike Fährmann
8fd8126117 fix ISO 639-1 code for Japanese
"jp" -> "ja"
2021-05-22 16:07:04 +02:00
Mike Fährmann
c5ca7905ce add 'noop()' and 'identity()' functions 2021-05-04 19:27:17 +02:00
Mike Fährmann
bff71cde80 implement 'util.unique_squence()' 2021-03-02 23:11:08 +01:00
Mike Fährmann
92071d02f4 fix crash when 'base-directory' is an empty string (#1339) 2021-02-24 14:49:17 +01:00
Mike Fährmann
970fc2b2b5 allow setting 'filename' & '(base-)directory' to default
by setting them to 'None'/'null'
2021-02-24 02:24:22 +01:00
Mike Fährmann
91308140ec make 'generate_token()' compatible with Python 3.4 2021-01-14 03:48:10 +01:00
Mike Fährmann
780b6adb91 rename 'generate_csrf_token()' to just 'generate_token()'
and add a 'size' argument
2021-01-11 22:12:40 +01:00
Mike Fährmann
79501a356f fix crash when 'path-restrict' is an object/dict
This basically reverts commit 5818c928

(#1234)
2021-01-10 00:13:48 +01:00
Mike Fährmann
5d4494b15f add "ascii" as a special 'path-restrict' value 2021-01-09 02:41:20 +01:00
Mike Fährmann
5818c928c4 refactor 'path-restrict' parsing 2021-01-09 02:33:42 +01:00
Mike Fährmann
aac00a2024 add 'd' conversion for format strings
to convert a timestamp to a formattable 'datetime' object.

For example '{created_at!d:%Y-%m-%d}'
transforms the timestamp in 'created_at' into a 'datetime' object
and then formats its content using '%Y-%m-%d' as template.

1262304000 -> datetime(2010, 1, 1) -> "2010-01-01"
2021-01-09 01:58:44 +01:00
Mike Fährmann
511d8d3fa3 increase SQLite connection timeouts (#1173) 2020-12-19 20:15:07 +01:00
Mike Fährmann
9b1bd09454 change 'extension-map' default
Replace all JPEG filename extensions with 'jpg'.
2020-11-14 22:40:31 +01:00
Mike Fährmann
e3480bc8de implement 'extension-map' option (#318) 2020-11-02 15:27:07 +01:00
Mike Fährmann
c3f01dc4e6 implement 'util.unique()' 2020-10-29 23:33:41 +01:00
Mike Fährmann
de4a1e45c9 improve 'generate_csrf_token()'
no need to use hashlib.md5()
2020-10-24 02:56:40 +02:00
Mike Fährmann
ec61696316 add 't' format string conversion (closes #1065)
to Trim whitespace from the beginning and end of strings.
Example: '{field!t}' becomes 'foo' for 'field' == "  \nfoo\t\r"
2020-10-16 00:37:22 +02:00
Mike Fährmann
1b1cf01d0d add a general 'generate_csrf_token()' function 2020-10-15 15:14:18 +02:00
Mike Fährmann
d5fa716d89 fix crash when using 'skip=false' and archive (fixes #1023)
Separating the archive check from pathfmt.exists() in b5243297
had some unintended side effects.

It is also not possible to monkey-patch a dunder method like
__contains__ because of the special method lookup that gets
performed for them.
2020-09-23 19:07:40 +02:00
Mike Fährmann
65744a7a31 use alternative for all falsey values in format strings
… and not just None (#525)

It would be better to consistently use None for all non-existent
fields and/or fields without a valid value, but this is a good
enough workaround for now.
2020-09-19 22:02:47 +02:00
Mike Fährmann
b5243297ff write skipped files to archive (closes #550) 2020-09-03 18:37:38 +02:00
Mike Fährmann
4d8b3e4f70 defer directory creation (fixes #722)
Only call os.makedirs() before a file is getting downloaded,
and not immediately for every Directory message.
2020-07-04 22:15:23 +02:00
Mike Fährmann
1ae1df0d27 update '--write-pages' (#737)
- fix infinite recursion for responses with multiple entries in
  'history'
- hide values of Set-Cookie headers
- only write the response content by default
  (use '-o write-pages=all' to also include HTTP headers)
2020-06-18 15:07:30 +02:00
Mike Fährmann
1fcf938f9c implement a general 'delete_items()' function 2020-06-06 23:49:49 +02:00
Mike Fährmann
ddc253cf9a implement a 'path-replace' option (#662, #755) 2020-05-25 22:21:58 +02:00
Mike Fährmann
15c3d29062 move dump_response() into a separate function (#737) 2020-05-25 22:21:58 +02:00
Mike Fährmann
bc53302ad6 extend 'path-restrict' option
Allow its value to be a JSON object / Python dict that specifies
a mapping from invalid/unwanted input characters to specific
output characters.

For example {"/": "-", "*": "+"} will transform
"foo / ***bar***" into "foo - +++bar+++"

(closes #662, #755)
2020-05-25 22:21:56 +02:00