Mike Fährmann
c978fe18d4
[text] add 'extract_urls()' helper
2026-02-07 21:47:17 +01:00
Mike Fährmann
37aa7337dc
[text] reject long filename extensions ( #8491 )
...
fixes regression introduced in 3252ead7c7
ref bc868e7bb8
2025-11-01 10:35:33 +01:00
Mike Fährmann
c8fc790028
merge branch 'dt': move datetime utils into separate module
...
- use 'datetime.fromisoformat()' when possible (#7671 )
- return a datetime-compatible object for invalid datetimes
(instead of a 'str' value)
2025-10-20 09:30:05 +02:00
Mike Fährmann
085616e0a8
[dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()'
2025-10-17 17:43:06 +02:00
Mike Fährmann
17156ab7a2
[text] implement 'nameext_from_name()'
2025-10-15 11:14:49 +02:00
Mike Fährmann
724ae3661b
[text] add 'empty' argument to 'parse_query()' ( #8377 )
...
enables including query parameters without value
2025-10-09 12:10:23 +02:00
Mike Fährmann
7bb4053396
[text] add 'sanitize_whitespace()'
2025-07-19 20:49:48 +02:00
Mike Fährmann
c08833aed9
[util] move 're' functions to text.py
2025-06-23 20:05:20 +02:00
Mike Fährmann
8f79ec67f4
[text] add 'build_query()'
2025-06-18 20:49:12 +02:00
Mike Fährmann
41191bb60a
'match.group(N)' -> 'match[N]' ( #7671 )
...
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
6d928f3805
remove some pre-3.8 workarounds ( #7671 )
2025-06-17 12:56:47 +02:00
Mike Fährmann
e84df260c0
[util] generalize 'build_duration_func'
2025-06-08 20:01:16 +02:00
Mike Fährmann
fe39b7d8c8
[text] slightly improve performance of 'extract' functions
...
by using 'None' instead of '0' as default 'pos' value
this only saves a few nanoseconds per call, but still
2025-05-23 17:53:28 +02:00
Mike Fährmann
f3ed15573a
[text] add 'rextr()'
2025-05-23 17:28:58 +02:00
Mike Fährmann
04464b6cf0
[text] add second argument to 'parse_query_list()' ( #7138 )
...
return only values whose name is in 'as_list' as a list
2025-03-10 09:36:50 +01:00
Mike Fährmann
db19990a82
[text] allow calling 'extract_iter' with invalid arguments
2025-03-02 10:44:06 +01:00
Mike Fährmann
b03ee3c4c4
[text] implement 'parse_query_list()'
2024-10-01 20:28:30 +02:00
Mike Fährmann
9f49cf16e8
[text] implement 'parse_query()' without using 'urllib.parse.parse_qsl'
...
doesn't support bytes anymore, but is twice as fast
2024-10-01 20:28:11 +02:00
Mike Fährmann
2c7a0c3ca8
add alternatives for deprecated utc datetime functions
2024-09-19 20:47:05 +02:00
Mike Fährmann
5227bb6b1d
[text] catch general Exceptions
2024-04-13 18:51:40 +02:00
Mike Fährmann
76581c13f7
handle URLs without '/' after their TLD ( #5252 )
2024-02-29 15:05:46 +01:00
Mike Fährmann
05255f5be0
add 'default' argument to 'text.extr()'
2022-11-09 11:00:32 +01:00
Mike Fährmann
eb33e6cf2d
add 'text.extr()'
...
a stripped-down version of text.extract() that
- always returns a string (like 'extract_from')
- only returns a string
- does not deal with 'pos' arguments
- is ~20% faster
2022-11-04 21:37:36 +01:00
Mike Fährmann
67bad04dda
[formatter] add 'g' conversion to sluGify a string ( #2410 )
2022-08-26 17:57:17 +02:00
Mike Fährmann
bddcec49f1
implement 'text.root_from_url()'
...
use domain from input URL for kemono
2022-03-01 03:09:57 +01:00
Mike Fährmann
bc0e853d30
combine KeyError & IndexError to common base class LookupError
2022-02-11 00:42:49 +01:00
Mike Fährmann
bc868e7bb8
consider apparently long extensions as part of the filename
...
(#1516 )
2021-05-02 21:15:50 +02:00
Mike Fährmann
387fe415d5
unescape items in text.split_html()
2021-03-29 02:12:29 +02:00
Mike Fährmann
78fd63b8f0
remove 'text.clean_xml()'
...
was not used anywhere
2021-03-28 04:05:16 +02:00
Mike Fährmann
8553b218d9
replace calls to 'os.path.splitext()' with 'str.rpartition()'
...
Makes functions who used it more than twice as fast
and we can get rid of an import as well.
2021-03-28 04:01:27 +02:00
Mike Fährmann
a09f42f6b3
improve filename_from_url() performance
...
Manually extracting the part between the last '/' and '?' instead of
relying on the standard libraries' 'urllib.parse.urlsplit()' increases
performance by ~400%.
urlsplit() : 3.64 secs per 1.000.000 iterations
partition(): 0.87 secs per 1.000.000 iterations
2020-10-23 00:14:06 +02:00
Mike Fährmann
37d71f6e09
strip microseconds in text.parse_datetime()
2020-06-17 21:40:16 +02:00
Mike Fährmann
6294e2c540
add 'text.ensure_http_scheme()'
2020-05-19 22:32:53 +02:00
Mike Fährmann
a0f4c295c0
add optional 'utcoffset' argument to 'parse_datetime()'
2020-04-11 02:05:00 +02:00
Mike Fährmann
f6c5edb76b
pre-compile regex pattern for remove_html() and split_html()
2020-03-13 23:31:54 +01:00
Mike Fährmann
b1bea8aaeb
add 'restrict-filenames' option ( #348 )
2019-07-23 17:41:24 +02:00
Mike Fährmann
1740086d8a
add 'repl' and 'sep' arguments to text.replace_html()
2019-07-17 14:48:24 +02:00
Mike Fährmann
b171befa87
implement 'parse_unicode_escapes()'
2019-06-16 21:47:24 +02:00
Mike Fährmann
2b1999476e
implement 'text.rextract()'
2019-05-28 21:03:41 +02:00
Mike Fährmann
2316e0ed3d
fix strptime workaround from b0e85a4
...
Don't return a modified version of 'date_time' if strptime fails.
2019-05-25 23:22:26 +02:00
Mike Fährmann
b0e85a42e3
apply workaround from 4736912 in parse_datetime() itself
2019-05-09 21:53:17 +02:00
Mike Fährmann
d09864b581
implement text.parse_datetime()
2019-05-08 15:43:59 +02:00
Mike Fährmann
6264a46212
use 'utcfromtimestamp()'
...
'fromtimestamp()' converts its results to the local timezone and causes
problems when running tests on a different machine.
2019-04-21 16:22:53 +02:00
Mike Fährmann
d670de0344
implement 'text.parse_timestamp()'
2019-04-21 15:28:27 +02:00
Mike Fährmann
21a7e395a7
implement convenience wrapper for text.extract functionality
2019-04-19 22:30:11 +02:00
Mike Fährmann
8f249f1d54
improve text.extract_iter() performance
...
by roughly 40% through
- inlining code
- pre-calculating reused values
- entering a try-except block only once
2019-04-18 23:37:17 +02:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
e1d3e9a926
add 'ext_from_url' to text.py
2019-01-31 12:23:25 +01:00
Mike Fährmann
2d2953a5bf
add 'text.parse_float()' + cleanup in text.py
2019-01-29 16:46:21 +01:00
Mike Fährmann
ae9a37a528
implement text.split_html()
2018-05-27 15:00:41 +02:00