Commit Graph

520 Commits

Author SHA1 Message Date
Mike Fährmann
7fe54bab2a attempt to fix some issues with 'contains()' (#2446)
add a third argument that gets used
when the values o search are given as a string
2022-04-08 14:40:26 +02:00
Mike Fährmann
d78a2c7163 re.escape() arguments for 'contains()' (#2446) 2022-04-07 15:35:54 +02:00
Mike Fährmann
413b77757b implement 'contains()' (#2446)
and add it to globals() in compiled expressions for --filter etc
2022-03-30 16:18:33 +02:00
Mike Fährmann
e7b30866d0 [postprocessor:mtime] fix timestamps from datetime objects (#2307)
'datetime.timestamp()', which got used to convert datetime objects to
POSIX timestamps, assumes naive datetimes represent LOCAL time, while
datetimes in 'date' metadata fields represent UTC time.

Ref: https://docs.python.org/3/library/datetime.html#datetime.datetime.timestamp
> Naive datetime instances are assumed to represent local time
> you can obtain the POSIX timestamp by … calculating the timestamp directly
2022-03-23 23:05:14 +01:00
Mike Fährmann
29db716a63 implement 'datetime_to_timestamp()'
and rename 'to_timestamp()'
to the more descriptive 'datetime_to_timestamp_string()'
2022-03-23 22:36:01 +01:00
Mike Fährmann
8295bc6d97 fix loading/storing cookies without domain 2022-03-19 15:14:55 +01:00
Mike Fährmann
500a479026 fix a third(!) bug in _check_cookies() (#2372)
turns out tests are worthless if you get em wrong ...
2022-03-18 19:52:37 +01:00
Mike Fährmann
cf44aba333 [formatter] allow evaluating f-string literals
by starting a format string with '\fF'.

This was technically already possible with '\fE',
but this makes it a bit more convenient.
2022-03-18 13:31:01 +01:00
Mike Fährmann
94452761ed fix cookies tests 2022-03-11 18:16:00 +01:00
Mike Fährmann
bddcec49f1 implement 'text.root_from_url()'
use domain from input URL for kemono
2022-03-01 03:09:57 +01:00
Mike Fährmann
f5b2b9333f fix another bug in _check:cookies (#2160)
regression introduced in ed317bfc

Added a couple of tests to hopefully catch such bugs
before they land in a release.
2022-02-16 22:58:57 +01:00
Mike Fährmann
563bd0ecf4 [danbooru] inherit from BaseExtractor
- merge danbooru and e621 code
- support booru.allthefallen.moe (closes #2283)
- remove support for old e621 tag search URLs
2022-02-11 21:01:51 +01:00
Mike Fährmann
b5b4f5a168 use 'build_extractor_filter' in test_results.py 2021-12-28 17:25:07 +01:00
Mike Fährmann
64cf26eaf4 allow specifying sleep-* options as string
either as single value or as range: "3.5", "2.1 - 5.0"
2021-12-18 23:28:56 +01:00
Mike Fährmann
010d65dcec extend blacklist/whitelist syntax (#2025)
Each entry in such a list can now also include a subcategory
'<category>:<subcategory>'
and it is possible to use '*' or an empty string as placeholder
'*:<subcategory>', ':<subcategory>', '<category>:*'

For example
  "blacklist": "imgur,*:tag,gfycat:user" or
  "blacklist": ["imgur", "*:tag", "gfycat:user"]
will filter all 'imgur' extractors, all extractors  with a 'tag'
subcategory (e.g. https://danbooru.donmai.us/posts?tags=bonocho),
and all 'gfycat' user extractors.
2021-11-23 20:31:43 +01:00
Mike Fährmann
af6424f398 allow testing metadata in list elements 2021-11-21 22:46:34 +01:00
Mike Fährmann
3842cdcd8f [formatter] implement 'D' format specifier
To be able to parse any string into a 'datetime' object
and format it as necessary.

Example:

{created_at:D%Y-%m-%dT%H:%M:%S%z}
->
"2010-01-01 00:00:00"

{created_at:D%Y-%m-%dT%H:%M:%S%z/%b %d %Y %I:%M %p}
->
"Jan 01 2010 12:00 AM"

with 'created_at' == "2010-01-01T01:00:00+0100"
2021-11-20 23:04:34 +01:00
Mike Fährmann
2ab190ce08 add tests for special format strings 2021-11-01 23:26:18 +01:00
Mike Fährmann
46e17c5e61 support accessing the current local datetime in format strings
{_now}, {_now:%Y-%m-%d}, etc
(#1968)
2021-10-30 21:41:09 +02:00
Mike Fährmann
38193dba46 support accessing environment variables in format strings (#1968)
{_env[HOME]} to get the value of $HOME
every other format string feature is supported as well
2021-10-28 19:18:55 +02:00
Mike Fährmann
f2d6b3e6b4 run tests without using 'nose'
run_tests.sh -> run_tests.py
2021-10-13 04:07:41 +02:00
Mike Fährmann
12fc646c53 fix filename formatting tests 2021-09-29 23:39:02 +02:00
Mike Fährmann
e0bdacd932 [fappic] add 'image' extractor (closes #1898) 2021-09-28 23:35:29 +02:00
Mike Fährmann
c22ff97743 remove 'unit' argument from 'util.format_value()' 2021-09-28 23:07:55 +02:00
Mike Fährmann
cad85640de move 'util.PathFormat' into its own 'path' module
to prevent circular imports between 'formatter' and 'util'
2021-09-27 21:29:37 +02:00
Mike Fährmann
74145467dd move 'util.Formatter' into its own 'formatter' module 2021-09-27 02:37:04 +02:00
Mike Fährmann
9377543162 [mastodon] add 'following' extractor (#1891) 2021-09-26 00:12:34 +02:00
Mike Fährmann
bd845303ad implement a way to shorten filenames with east-asian characters
(#1377)

Setting 'output.shorten' to "eaw" (East-Asian Width) uses a slower
algorithm that also considers characters with a width > 1.
2021-09-13 21:38:33 +02:00
Mike Fährmann
292fffc83c add 'j' format string conversion
to convert to a JSON formatted string
2021-08-28 01:19:36 +02:00
Mike Fährmann
bb6a130942 automatically set required DDoS-GUARD cookies (#1779)
for kemono.party and seiso.party
2021-08-16 17:40:29 +02:00
Mike Fährmann
2792ed6e4b implement 'util.format_value()' 2021-07-26 02:11:22 +02:00
Mike Fährmann
9e42cd58ea replace ChainPredicate class with 'functools.partial' 2021-07-20 20:21:32 +02:00
Mike Fährmann
36ac2197db [ytdl] add extractor for sites supported by youtube-dl
(#1680, #878)

Can be used by prefixing any URL with 'ytdl:',
or by setting 'extractor,ytdl.enabled' to 'true'.
2021-07-10 20:55:47 +02:00
Mike Fährmann
64240c8d42 [imagevenue] fix extraction
(closes #1677)
2021-07-09 20:13:18 +02:00
Mike Fährmann
0179581340 add 'T' format string conversion (#1646)
to convert 'date'/datetime to timestamp
2021-06-25 22:35:45 +02:00
Mike Fährmann
f74cf52e2b [seisoparty] add 'user' and 'post' extractors (#1635) 2021-06-25 18:40:11 +02:00
Mike Fährmann
759735fb02 [kemonoparty] fix 'username' extraction (fixes #1652)
The site's <title> content changed from

<title>NAME | Kemono</title>

to

<title>
    NAME | Kemono
</title>
2021-06-25 15:35:20 +02:00
Mike Fährmann
07c8adbd8b [mangadex] implement login with username & password (#1535) 2021-06-08 02:12:57 +02:00
Mike Fährmann
4a747a31a3 [postprocessor:metadata] handle dicts in mode;tags (fixes #1598) 2021-06-04 22:37:43 +02:00
Mike Fährmann
3cbbefd4ed support 'filter' option for post processors (#1460) 2021-06-04 18:23:32 +02:00
Mike Fährmann
0abad8bc12 implement 'compile_expression()' 2021-06-03 22:34:58 +02:00
Mike Fährmann
da6806a161 fix job tests for Python 3.4 and 3.5
assert_called() and assert_not_called() got added in Python 3.6
2021-05-22 21:40:52 +02:00
Mike Fährmann
8fd8126117 fix ISO 639-1 code for Japanese
"jp" -> "ja"
2021-05-22 16:07:04 +02:00
Mike Fährmann
af9dba4684 add DataJob tests 2021-05-21 02:59:54 +02:00
Mike Fährmann
adf4d661b3 use '_extractor' info in UrlJobs 2021-05-19 15:52:30 +02:00
Mike Fährmann
1eabfa5c7a [pillowfort] implement login with username & password (#846) 2021-05-19 02:59:16 +02:00
Mike Fährmann
559462789d add some tests for job.py 2021-05-14 19:44:16 +02:00
Mike Fährmann
c5ca7905ce add 'noop()' and 'identity()' functions 2021-05-04 19:27:17 +02:00
Mike Fährmann
bc868e7bb8 consider apparently long extensions as part of the filename
(#1516)
2021-05-02 21:15:50 +02:00
Mike Fährmann
bdfcc9c4b1 update extractor test results 2021-04-18 20:28:15 +02:00