Mike Fährmann
9bf76c1352
replace 'util.re()' with 'text.re()'
...
remove unnecessary 'util' imports
2025-10-20 17:44:58 +02:00
Mike Fährmann
99d5c521d1
use 'encoding="utf-8"' when opening files in text mode ( #8376 )
2025-10-09 09:54:18 +02:00
Mike Fährmann
41191bb60a
'match.group(N)' -> 'match[N]' ( #7671 )
...
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083
update copyright notices
2025-06-13 00:03:41 +02:00
Mike Fährmann
b5c88b3d3e
replace standard library 're' uses with 'util.re()'
2025-06-06 13:24:52 +02:00
Mike Fährmann
473ee5ff85
[recursive] add 'https://' to URLs if not present
2024-12-10 17:16:52 +01:00
Mike Fährmann
9f75713e00
[recursive] simplify
2023-09-13 21:47:20 +02:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
3918b69677
remove 'extractor.blacklist' context manager
2020-09-11 13:17:35 +02:00
Mike Fährmann
a1e739b96c
reuse connection adapters from parent extractors
2020-05-12 23:52:01 +02:00
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
2019-02-12 18:46:48 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
529aa21dd9
move FileAdapter definition into recursive.py
2018-09-16 20:59:22 +02:00
Mike Fährmann
f10ffc0839
update extractor blacklist to also allow classes
2018-01-14 18:47:22 +01:00
Mike Fährmann
0dedbe759c
enable '--chapter-filter'
...
The same filter infrastructure that can be applied to image URLS now
also works for manga chapters and other delegated URLs.
TODO: actually provide any metadata (currently supported is only
deviantart and imagefap).
2017-09-12 16:19:00 +02:00
Mike Fährmann
2993206c4b
smaller fixes and "security" measures
...
- move the OAuthSession class into util.py
- block special extractors for reddit and recursive
- ignore 'only matching' tests for testresults script
2017-06-16 21:01:40 +02:00
Mike Fährmann
691c4dd709
support direct image links
2017-05-24 12:51:18 +02:00
Mike Fährmann
e425243b1e
[reddit] some small fixes
...
- filter or complete some URLs
- remove the 'nofollow:' scheme before printing URLs
- (#15 )
2017-05-23 11:48:00 +02:00
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
2017-02-01 00:53:19 +01:00
Mike Fährmann
0989cd2430
add basic support for file:// URLs
...
this allows you to feed local files into the recursive extractor
2016-12-05 18:27:36 +01:00
Mike Fährmann
d31ccb16f2
rename 'generic' to 'recursive'
2016-10-01 15:54:27 +02:00