Commit Graph

480 Commits

Author SHA1 Message Date
Mike Fährmann
0abad8bc12 implement 'compile_expression()' 2021-06-03 22:34:58 +02:00
Mike Fährmann
da6806a161 fix job tests for Python 3.4 and 3.5
assert_called() and assert_not_called() got added in Python 3.6
2021-05-22 21:40:52 +02:00
Mike Fährmann
8fd8126117 fix ISO 639-1 code for Japanese
"jp" -> "ja"
2021-05-22 16:07:04 +02:00
Mike Fährmann
af9dba4684 add DataJob tests 2021-05-21 02:59:54 +02:00
Mike Fährmann
adf4d661b3 use '_extractor' info in UrlJobs 2021-05-19 15:52:30 +02:00
Mike Fährmann
1eabfa5c7a [pillowfort] implement login with username & password (#846) 2021-05-19 02:59:16 +02:00
Mike Fährmann
559462789d add some tests for job.py 2021-05-14 19:44:16 +02:00
Mike Fährmann
c5ca7905ce add 'noop()' and 'identity()' functions 2021-05-04 19:27:17 +02:00
Mike Fährmann
bc868e7bb8 consider apparently long extensions as part of the filename
(#1516)
2021-05-02 21:15:50 +02:00
Mike Fährmann
bdfcc9c4b1 update extractor test results 2021-04-18 20:28:15 +02:00
Mike Fährmann
387fe415d5 unescape items in text.split_html() 2021-03-29 02:12:29 +02:00
Mike Fährmann
78fd63b8f0 remove 'text.clean_xml()'
was not used anywhere
2021-03-28 04:05:16 +02:00
Mike Fährmann
8553b218d9 replace calls to 'os.path.splitext()' with 'str.rpartition()'
Makes functions who used it more than twice as fast
and we can get rid of an import as well.
2021-03-28 04:01:27 +02:00
Mike Fährmann
bff71cde80 implement 'util.unique_squence()' 2021-03-02 23:11:08 +01:00
Mike Fährmann
5f1a6ff6fa remove unneeded 'TRAVIS_SKIP' from test_results.py 2021-03-01 01:38:18 +01:00
Mike Fährmann
8821dceb79 use __import__() to dynamically load modules 2021-03-01 01:27:02 +01:00
Mike Fährmann
36bf76fa44 update 'oauth:mastodon:<instance>' code 2021-01-28 02:20:12 +01:00
Mike Fährmann
91308140ec make 'generate_token()' compatible with Python 3.4 2021-01-14 03:48:10 +01:00
Mike Fährmann
780b6adb91 rename 'generate_csrf_token()' to just 'generate_token()'
and add a 'size' argument
2021-01-11 22:12:40 +01:00
Mike Fährmann
0fdaea00a3 [postprocessor:metadata] sanitize filenames 2021-01-10 00:13:20 +01:00
Mike Fährmann
aac00a2024 add 'd' conversion for format strings
to convert a timestamp to a formattable 'datetime' object.

For example '{created_at!d:%Y-%m-%d}'
transforms the timestamp in 'created_at' into a 'datetime' object
and then formats its content using '%Y-%m-%d' as template.

1262304000 -> datetime(2010, 1, 1) -> "2010-01-01"
2021-01-09 01:58:44 +01:00
Mike Fährmann
912eea29bc update extractor test results 2020-12-27 17:41:08 +01:00
Mike Fährmann
1f9121fecb release version 1.16.0 2020-12-12 23:08:25 +01:00
Mike Fährmann
b2c55f0a72 [sankaku] remove login support
The old login method for 'https://chan.sankakucomplex.com/user/login'
and the cookies it produces have no effect on the results from
'beta.sankakucomplex.com'.
2020-12-08 21:05:47 +01:00
Mike Fährmann
547107307e fix 'Metadata' messages in result tests 2020-11-24 13:34:54 +01:00
Mike Fährmann
578dcf805c [mangapanda] don't force https:// 2020-11-21 20:24:37 +01:00
Mike Fährmann
ca59bd691c [postprocessor:metadata] add 'event' and 'filename' options 2020-11-20 22:29:11 +01:00
Mike Fährmann
9fffa9c343 rework post processor callbacks 2020-11-19 02:29:06 +01:00
Mike Fährmann
1e3dd7330e merge SharedConfigMixin functionality into Extractor 2020-11-17 00:34:07 +01:00
Mike Fährmann
e5438b8a29 release version 1.15.3 2020-11-13 15:50:05 +01:00
Mike Fährmann
b9bfa4c675 update extractor test results 2020-11-07 02:03:22 +01:00
Mike Fährmann
c3f01dc4e6 implement 'util.unique()' 2020-10-29 23:33:41 +01:00
Mike Fährmann
d83b95fd28 [postprocessor:metadata] accept a string-list for 'content-format'
(closes #1080)
2020-10-27 20:09:58 +01:00
Mike Fährmann
350b1afe1c speed up _list_classes() after iterating over all modules once 2020-10-26 22:18:15 +01:00
Mike Fährmann
18213dc5ba release version 1.15.2 2020-10-24 18:57:29 +02:00
Mike Fährmann
ec61696316 add 't' format string conversion (closes #1065)
to Trim whitespace from the beginning and end of strings.
Example: '{field!t}' becomes 'foo' for 'field' == "  \nfoo\t\r"
2020-10-16 00:37:22 +02:00
Mike Fährmann
07432d6262 [seiga] fix flake8 and cookie test (#1063) 2020-10-15 15:37:58 +02:00
Mike Fährmann
b8daabc3ca [pinterest] implement login support (closes #1055)
being logged allows access to secret/protected boards
2020-10-15 15:14:18 +02:00
kurumigi
7e0e872f4f [seiga] Add metadata for single image downloads (#1063)
* [seiga] Support image metadata.

* [seiga] Update test data.

* [seiga] Fix cookie check.

* [test_cookies] [seiga] Fit test_cookies.py to the last commit.
2020-10-15 15:13:27 +02:00
Mike Fährmann
844793847c update extractor test results 2020-10-11 18:15:41 +02:00
Mike Fährmann
c874071f5a [kissmanga] remove module 2020-10-04 22:46:41 +02:00
Mike Fährmann
844502cad5 update extractor test results 2020-10-03 19:24:19 +02:00
Mike Fährmann
7cd383c0f9 update extractor test results 2020-09-20 21:54:39 +02:00
Mike Fährmann
65744a7a31 use alternative for all falsey values in format strings
… and not just None (#525)

It would be better to consistently use None for all non-existent
fields and/or fields without a valid value, but this is a good
enough workaround for now.
2020-09-19 22:02:47 +02:00
Mike Fährmann
f5b7ae01c1 update extractor test results 2020-09-15 18:07:08 +02:00
Mike Fährmann
392d022b04 implement 'config.accumulate()' (#994) 2020-09-14 21:13:08 +02:00
Mike Fährmann
3108e85b89 [worldthree] remove extractors
http://www.slide.world-three.org/ hasn't been accessible for a long time.
2020-09-11 18:12:57 +02:00
Mike Fährmann
3918b69677 remove 'extractor.blacklist' context manager 2020-09-11 13:17:35 +02:00
Mike Fährmann
ac3036ef56 add 'filesize-min' and 'filesize-max' options (closes #780) 2020-09-03 18:21:04 +02:00
Mike Fährmann
fd0685d9b5 [postprocessor:zip] defer zip file creation (fixes #968)
don't try to create zip files on postprocessor construction,
wait until directory creation during file download,
2020-08-31 21:53:18 +02:00