Commit Graph

498 Commits

Author SHA1 Message Date
Mike Fährmann
e0bdacd932 [fappic] add 'image' extractor (closes #1898) 2021-09-28 23:35:29 +02:00
Mike Fährmann
c22ff97743 remove 'unit' argument from 'util.format_value()' 2021-09-28 23:07:55 +02:00
Mike Fährmann
cad85640de move 'util.PathFormat' into its own 'path' module
to prevent circular imports between 'formatter' and 'util'
2021-09-27 21:29:37 +02:00
Mike Fährmann
74145467dd move 'util.Formatter' into its own 'formatter' module 2021-09-27 02:37:04 +02:00
Mike Fährmann
9377543162 [mastodon] add 'following' extractor (#1891) 2021-09-26 00:12:34 +02:00
Mike Fährmann
bd845303ad implement a way to shorten filenames with east-asian characters
(#1377)

Setting 'output.shorten' to "eaw" (East-Asian Width) uses a slower
algorithm that also considers characters with a width > 1.
2021-09-13 21:38:33 +02:00
Mike Fährmann
292fffc83c add 'j' format string conversion
to convert to a JSON formatted string
2021-08-28 01:19:36 +02:00
Mike Fährmann
bb6a130942 automatically set required DDoS-GUARD cookies (#1779)
for kemono.party and seiso.party
2021-08-16 17:40:29 +02:00
Mike Fährmann
2792ed6e4b implement 'util.format_value()' 2021-07-26 02:11:22 +02:00
Mike Fährmann
9e42cd58ea replace ChainPredicate class with 'functools.partial' 2021-07-20 20:21:32 +02:00
Mike Fährmann
36ac2197db [ytdl] add extractor for sites supported by youtube-dl
(#1680, #878)

Can be used by prefixing any URL with 'ytdl:',
or by setting 'extractor,ytdl.enabled' to 'true'.
2021-07-10 20:55:47 +02:00
Mike Fährmann
64240c8d42 [imagevenue] fix extraction
(closes #1677)
2021-07-09 20:13:18 +02:00
Mike Fährmann
0179581340 add 'T' format string conversion (#1646)
to convert 'date'/datetime to timestamp
2021-06-25 22:35:45 +02:00
Mike Fährmann
f74cf52e2b [seisoparty] add 'user' and 'post' extractors (#1635) 2021-06-25 18:40:11 +02:00
Mike Fährmann
759735fb02 [kemonoparty] fix 'username' extraction (fixes #1652)
The site's <title> content changed from

<title>NAME | Kemono</title>

to

<title>
    NAME | Kemono
</title>
2021-06-25 15:35:20 +02:00
Mike Fährmann
07c8adbd8b [mangadex] implement login with username & password (#1535) 2021-06-08 02:12:57 +02:00
Mike Fährmann
4a747a31a3 [postprocessor:metadata] handle dicts in mode;tags (fixes #1598) 2021-06-04 22:37:43 +02:00
Mike Fährmann
3cbbefd4ed support 'filter' option for post processors (#1460) 2021-06-04 18:23:32 +02:00
Mike Fährmann
0abad8bc12 implement 'compile_expression()' 2021-06-03 22:34:58 +02:00
Mike Fährmann
da6806a161 fix job tests for Python 3.4 and 3.5
assert_called() and assert_not_called() got added in Python 3.6
2021-05-22 21:40:52 +02:00
Mike Fährmann
8fd8126117 fix ISO 639-1 code for Japanese
"jp" -> "ja"
2021-05-22 16:07:04 +02:00
Mike Fährmann
af9dba4684 add DataJob tests 2021-05-21 02:59:54 +02:00
Mike Fährmann
adf4d661b3 use '_extractor' info in UrlJobs 2021-05-19 15:52:30 +02:00
Mike Fährmann
1eabfa5c7a [pillowfort] implement login with username & password (#846) 2021-05-19 02:59:16 +02:00
Mike Fährmann
559462789d add some tests for job.py 2021-05-14 19:44:16 +02:00
Mike Fährmann
c5ca7905ce add 'noop()' and 'identity()' functions 2021-05-04 19:27:17 +02:00
Mike Fährmann
bc868e7bb8 consider apparently long extensions as part of the filename
(#1516)
2021-05-02 21:15:50 +02:00
Mike Fährmann
bdfcc9c4b1 update extractor test results 2021-04-18 20:28:15 +02:00
Mike Fährmann
387fe415d5 unescape items in text.split_html() 2021-03-29 02:12:29 +02:00
Mike Fährmann
78fd63b8f0 remove 'text.clean_xml()'
was not used anywhere
2021-03-28 04:05:16 +02:00
Mike Fährmann
8553b218d9 replace calls to 'os.path.splitext()' with 'str.rpartition()'
Makes functions who used it more than twice as fast
and we can get rid of an import as well.
2021-03-28 04:01:27 +02:00
Mike Fährmann
bff71cde80 implement 'util.unique_squence()' 2021-03-02 23:11:08 +01:00
Mike Fährmann
5f1a6ff6fa remove unneeded 'TRAVIS_SKIP' from test_results.py 2021-03-01 01:38:18 +01:00
Mike Fährmann
8821dceb79 use __import__() to dynamically load modules 2021-03-01 01:27:02 +01:00
Mike Fährmann
36bf76fa44 update 'oauth:mastodon:<instance>' code 2021-01-28 02:20:12 +01:00
Mike Fährmann
91308140ec make 'generate_token()' compatible with Python 3.4 2021-01-14 03:48:10 +01:00
Mike Fährmann
780b6adb91 rename 'generate_csrf_token()' to just 'generate_token()'
and add a 'size' argument
2021-01-11 22:12:40 +01:00
Mike Fährmann
0fdaea00a3 [postprocessor:metadata] sanitize filenames 2021-01-10 00:13:20 +01:00
Mike Fährmann
aac00a2024 add 'd' conversion for format strings
to convert a timestamp to a formattable 'datetime' object.

For example '{created_at!d:%Y-%m-%d}'
transforms the timestamp in 'created_at' into a 'datetime' object
and then formats its content using '%Y-%m-%d' as template.

1262304000 -> datetime(2010, 1, 1) -> "2010-01-01"
2021-01-09 01:58:44 +01:00
Mike Fährmann
912eea29bc update extractor test results 2020-12-27 17:41:08 +01:00
Mike Fährmann
1f9121fecb release version 1.16.0 2020-12-12 23:08:25 +01:00
Mike Fährmann
b2c55f0a72 [sankaku] remove login support
The old login method for 'https://chan.sankakucomplex.com/user/login'
and the cookies it produces have no effect on the results from
'beta.sankakucomplex.com'.
2020-12-08 21:05:47 +01:00
Mike Fährmann
547107307e fix 'Metadata' messages in result tests 2020-11-24 13:34:54 +01:00
Mike Fährmann
578dcf805c [mangapanda] don't force https:// 2020-11-21 20:24:37 +01:00
Mike Fährmann
ca59bd691c [postprocessor:metadata] add 'event' and 'filename' options 2020-11-20 22:29:11 +01:00
Mike Fährmann
9fffa9c343 rework post processor callbacks 2020-11-19 02:29:06 +01:00
Mike Fährmann
1e3dd7330e merge SharedConfigMixin functionality into Extractor 2020-11-17 00:34:07 +01:00
Mike Fährmann
e5438b8a29 release version 1.15.3 2020-11-13 15:50:05 +01:00
Mike Fährmann
b9bfa4c675 update extractor test results 2020-11-07 02:03:22 +01:00
Mike Fährmann
c3f01dc4e6 implement 'util.unique()' 2020-10-29 23:33:41 +01:00