Commit Graph

65 Commits

Author SHA1 Message Date
Mike Fährmann
d651d45239 implement specifying ranges in slice notation (#918, #2865)
e.g.
- '1:101'   or ':101' or ':101:'  for files 1 to 100
- '1::2'    or '::2'              for every second file
- '1:101:5' or ':101:5'           for files 1, 6, 11, ..., 91, 96

(the second argument specifies the first index NOT included)
2022-12-27 18:21:12 +01:00
Mike Fährmann
3616adfc75 implement '--range' with Python ranges 2022-12-26 18:32:34 +01:00
Mike Fährmann
1800bd7d14 allow '*-filter' options to be a list of expressions 2022-12-23 22:20:21 +01:00
Mike Fährmann
43c211f1a7 extend and rename util.CustomNone 2022-12-06 22:08:51 +01:00
Mike Fährmann
c0051d7d4c fix test 2022-08-01 21:40:35 +02:00
Mike Fährmann
dd3a6a9fd1 make 'enumerate_reversed()' work with generators (#2795) 2022-08-01 14:08:44 +02:00
Mike Fährmann
c4b9f7bab8 update functions working with cookies.txt files
- rename
  - load_cookiestxt -> cookiestxt_load
  - save_cookiestxt -< cookiestxt_store
- in cookiestxt_load, add cookies directly to a cookie jar
  instead of storing them in a list first
- other unnoticeable performance increases
2022-05-06 13:21:29 +02:00
Mike Fährmann
ca3a364db7 fix build_duration_func() (#2533)
for extractors with request_interval_min > 0
2022-04-27 20:28:14 +02:00
Mike Fährmann
7fe54bab2a attempt to fix some issues with 'contains()' (#2446)
add a third argument that gets used
when the values o search are given as a string
2022-04-08 14:40:26 +02:00
Mike Fährmann
d78a2c7163 re.escape() arguments for 'contains()' (#2446) 2022-04-07 15:35:54 +02:00
Mike Fährmann
413b77757b implement 'contains()' (#2446)
and add it to globals() in compiled expressions for --filter etc
2022-03-30 16:18:33 +02:00
Mike Fährmann
29db716a63 implement 'datetime_to_timestamp()'
and rename 'to_timestamp()'
to the more descriptive 'datetime_to_timestamp_string()'
2022-03-23 22:36:01 +01:00
Mike Fährmann
8295bc6d97 fix loading/storing cookies without domain 2022-03-19 15:14:55 +01:00
Mike Fährmann
64cf26eaf4 allow specifying sleep-* options as string
either as single value or as range: "3.5", "2.1 - 5.0"
2021-12-18 23:28:56 +01:00
Mike Fährmann
010d65dcec extend blacklist/whitelist syntax (#2025)
Each entry in such a list can now also include a subcategory
'<category>:<subcategory>'
and it is possible to use '*' or an empty string as placeholder
'*:<subcategory>', ':<subcategory>', '<category>:*'

For example
  "blacklist": "imgur,*:tag,gfycat:user" or
  "blacklist": ["imgur", "*:tag", "gfycat:user"]
will filter all 'imgur' extractors, all extractors  with a 'tag'
subcategory (e.g. https://danbooru.donmai.us/posts?tags=bonocho),
and all 'gfycat' user extractors.
2021-11-23 20:31:43 +01:00
Mike Fährmann
c22ff97743 remove 'unit' argument from 'util.format_value()' 2021-09-28 23:07:55 +02:00
Mike Fährmann
74145467dd move 'util.Formatter' into its own 'formatter' module 2021-09-27 02:37:04 +02:00
Mike Fährmann
292fffc83c add 'j' format string conversion
to convert to a JSON formatted string
2021-08-28 01:19:36 +02:00
Mike Fährmann
2792ed6e4b implement 'util.format_value()' 2021-07-26 02:11:22 +02:00
Mike Fährmann
9e42cd58ea replace ChainPredicate class with 'functools.partial' 2021-07-20 20:21:32 +02:00
Mike Fährmann
0179581340 add 'T' format string conversion (#1646)
to convert 'date'/datetime to timestamp
2021-06-25 22:35:45 +02:00
Mike Fährmann
0abad8bc12 implement 'compile_expression()' 2021-06-03 22:34:58 +02:00
Mike Fährmann
8fd8126117 fix ISO 639-1 code for Japanese
"jp" -> "ja"
2021-05-22 16:07:04 +02:00
Mike Fährmann
c5ca7905ce add 'noop()' and 'identity()' functions 2021-05-04 19:27:17 +02:00
Mike Fährmann
bff71cde80 implement 'util.unique_squence()' 2021-03-02 23:11:08 +01:00
Mike Fährmann
91308140ec make 'generate_token()' compatible with Python 3.4 2021-01-14 03:48:10 +01:00
Mike Fährmann
780b6adb91 rename 'generate_csrf_token()' to just 'generate_token()'
and add a 'size' argument
2021-01-11 22:12:40 +01:00
Mike Fährmann
aac00a2024 add 'd' conversion for format strings
to convert a timestamp to a formattable 'datetime' object.

For example '{created_at!d:%Y-%m-%d}'
transforms the timestamp in 'created_at' into a 'datetime' object
and then formats its content using '%Y-%m-%d' as template.

1262304000 -> datetime(2010, 1, 1) -> "2010-01-01"
2021-01-09 01:58:44 +01:00
Mike Fährmann
c3f01dc4e6 implement 'util.unique()' 2020-10-29 23:33:41 +01:00
Mike Fährmann
ec61696316 add 't' format string conversion (closes #1065)
to Trim whitespace from the beginning and end of strings.
Example: '{field!t}' becomes 'foo' for 'field' == "  \nfoo\t\r"
2020-10-16 00:37:22 +02:00
Mike Fährmann
65744a7a31 use alternative for all falsey values in format strings
… and not just None (#525)

It would be better to consistently use None for all non-existent
fields and/or fields without a valid value, but this is a good
enough workaround for now.
2020-09-19 22:02:47 +02:00
Mike Fährmann
5df8f2959b insert local directory into PYTHONPATH when running tests 2020-05-02 01:15:50 +02:00
Mike Fährmann
90e4c645ba [formatter] allow multiple "special" format specifiers (#595)
It is now, for example, possible to specify multiple replacement
operations per format replacement field: {name:Ra/b/Rc/d/}
2020-02-16 21:47:08 +01:00
Mike Fährmann
219c4cc78c [formatter] allow for numeric list and string indices 2020-02-15 22:46:22 +01:00
Mike Fährmann
7d1da614d9 [formatter] implement field name alternatives (#525)
The format string '{a|b|c}' will now try to use the value from 'a' and
fall back to 'b' and 'c' if accessing a field raises an exception or
if its value is None.
2020-02-15 17:58:21 +01:00
Mike Fährmann
2a9be48511 improve util.load/save_cookiestxt() and add tests
- take a file object as argument instead of an filename
- accept whitespace before comments ("   # comment")
- map expiration "0" to None and not the number 0
2020-01-25 23:02:15 +01:00
Mike Fährmann
3fc1e12949 [postprocessor:metadata] filter private entries
i.e. keys starting with an underscore
2019-11-21 16:58:44 +01:00
Mike Fährmann
d5e3910270 adjust 'util.raises()' 2019-10-28 15:06:17 +01:00
Mike Fährmann
95b1e4c3c0 implement R<old>/<new>/ format option (#318) 2019-06-23 22:45:44 +02:00
Mike Fährmann
a5b060765d improve code in tests
- use 'assertRaises' as context manager
- remove calls to .keys()
2019-05-13 11:48:20 +02:00
Mike Fährmann
a881537b91 more util.py tests 2019-03-06 21:09:37 +01:00
Mike Fährmann
148b8f15d0 update tests for util.py 2019-02-14 11:15:19 +01:00
Mike Fährmann
79c01ec7ae implement J<separator>/ format option
J joins list elements by calling <separator>.join(list):

Example:
{f:J - /} -> "a - b - c" (if "f" is ["a", "b", "c"])
2019-01-17 17:01:58 +01:00
Mike Fährmann
c5d4f558c9 allow missing field access keys in format strings (#136) 2018-12-22 13:54:14 +01:00
Mike Fährmann
0514d6a0ae make --filter and --range config-file options
The functionality of --(chapter-)filter and --(chapter-)range are now
also exposed as the following config-file options:

- extractor.*.image-filter
- extractor.*.image-range
- extractor.*.chapter-filter
- extractor.*.chapter-range

TODO: update configuration.rst
2018-10-07 21:39:56 +02:00
Mike Fährmann
590c0b3ad5 re-implement and improve filename formatter
A format string now gets parsed only once instead of re-parsing it each
time it is applied to a set of data.

The initial parsing causes directory path creation to be at about 2x
slower than before, since each format string there is used only once,
but building a filename, the more common operation, is at least 2x
faster. The "directory slowness" cancels at about 5 filenames and
everything above that is significantly faster.
2018-08-25 10:45:14 +02:00
Mike Fährmann
e0dd8dff5f implement L<maxlen>/<replacement>/ format option
The L option allows for the contents of a format field to be replaced
with <replacement> if its length is greater than <maxlen>.

Example:
{f:L5/too long/} -> "foo"      (if "f" is "foo")
                 -> "too long" (if "f" is "foobar")

(#92) (#94)
2018-07-29 13:52:07 +02:00
Mike Fährmann
8fe9056b16 implement string slicing for format strings
It is now possible to slice string (or list) values of format string
replacement fields with the same syntax as in regular Python code.

"{digits}"       -> "0123456789"
"{digits[2:-2]}" -> "234567"
"{digits[:5]}"   -> "01234"

The optional third parameter (step) has been left out to simplify things.
2018-07-14 09:53:15 +02:00
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
51ea699083 add 'abort()' as function to filter expressions
calling 'abort()' in a filter aborts the current extractor run
in a cleaner way than using something like 1/0, which
causes an error message to be printed
2018-04-12 17:07:12 +02:00