Mike Fährmann
53cdfaac37
[common] add reference to 'exception' module to Extractor class
...
- remove 'exception' imports
- replace with 'self.exc'
2026-02-15 10:57:22 +01:00
Mike Fährmann
12f5e24ab5
use sets for ' in { ... }' checks
2026-02-11 22:55:01 +01:00
Mike Fährmann
e006d26c8e
Revert "use f-strings when building 'pattern'"
...
revert d7c97d5a97 .
2025-12-20 22:07:37 +01:00
Mike Fährmann
8ec48a039f
[aryion:favorite] ignore already seen folders ( #8728 )
2025-12-19 11:29:53 +01:00
Mike Fährmann
c3d8602418
[path] implement dynamic length directories ( #1350 )
...
append directory segments for each item of a list (or general non-string
iterable), which can be returned with the 'I' specifier
2025-12-18 09:53:26 +01:00
Mike Fährmann
982d9908c1
[aryion] fix "AttributeError: '_pagination'" ( #8723 )
...
fixes regression introduced in 62d6a2206d
2025-12-18 08:36:34 +01:00
Mike Fährmann
62d6a2206d
[aryion] add 'watch' extractor ( #8705 )
2025-12-16 17:57:41 +01:00
Mike Fährmann
3536816d51
[aryion] fix 'description' metadata
2025-12-16 17:23:59 +01:00
Mike Fährmann
58ccde3645
[aryion:favorite] support 'category' URLs ( #8705 )
2025-12-16 17:23:31 +01:00
Mike Fährmann
83ca65d918
[aryion:favorite] support folder items ( #8705 )
2025-12-15 20:36:01 +01:00
Mike Fährmann
968597a302
yield 3-tuples for Message.Directory
...
adapt tuples to the same length and semantics as other messages
2025-12-05 21:39:52 +01:00
Mike Fährmann
aa39770783
[aryion:search] simplify further
...
- skip 'build_query()' step
- add underscores to prefixes
2025-11-19 19:54:53 +01:00
vorsatile
991fe0f2a7
[aryion] add 'search' extractor ( #8567 )
...
* [aryion] Implement search extractor.
* [aryion] Update capabilities.
* [aryion] Adjust example.
* fix flake8 errors
* update & simplify
- use existing '_pagination_next()'
- remove '_pagination_search()'
- update 'search[…]' metadata
* add tests
---------
Co-authored-by: Mike Fährmann <mike_faehrmann@web.de >
2025-11-18 20:57:08 +01:00
Mike Fährmann
d7c97d5a97
use f-strings when building 'pattern'
2025-10-20 21:23:11 +02:00
Mike Fährmann
69f7cfdd0c
[dt] replace 'datetime' imports
2025-10-16 11:42:42 +02:00
Mike Fährmann
1fc20d3fdd
[aryion] fix pagination ( #8091 )
...
ensure there is no "Next >>" link before stopping
2025-08-21 10:59:42 +02:00
Mike Fährmann
a097a373a9
simplify if statements by using walrus operators ( #7671 )
2025-07-22 20:57:54 +02:00
Mike Fährmann
22ec687d54
[aryion] fix 'favorite' extractor ( #7775 )
2025-07-04 20:23:27 +02:00
Mike Fährmann
9dbe33b6de
replace old %-formatted and .format(…) strings with f-strings ( #7671 )
...
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a
'match.group(N)' -> 'match[N]' ( #7671 )
...
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083
update copyright notices
2025-06-13 00:03:41 +02:00
Mike Fährmann
b81fc5c124
replace text.rextract() with rextr()
2025-05-23 18:28:58 +02:00
Mike Fährmann
b76e7de1a7
[dl:http] fix setting 'mtime' per file ( #7529 )
...
introduce '_http_lastmodified' meta field
2025-05-21 13:50:51 +02:00
Mike Fährmann
156a70bec0
[aryion] update favorite extractor
...
- add test case
- add docs/supportedsites entry
- add custom directory_fmt and archive_fmt
- remove constructor
- appease flake8
2024-07-21 12:34:06 +02:00
walkenjoyer
19e98ef8e9
[aryion] Add favorite extractor ( #4511 )
2024-07-20 18:49:59 +02:00
Mike Fährmann
57fc6fcf83
replace '24*3600' with '86400'
...
and generalize cache maxage values
2023-12-18 23:57:22 +01:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba
consistent cookie-related names
...
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
b03ca7f10c
[aryion] provide correct 'date' independent of dst
2022-03-24 22:57:18 +01:00
Mike Fährmann
4b3e309b90
[aryion] update/improve pagination ( #1849 )
...
Manually increment the 'p' query parameter,
instead of relying on a "Next" link which only works up to page 200.
2021-09-16 16:27:25 +02:00
Mike Fährmann
266ed9b62e
[aryion] add 'tag' extractor ( closes #1849 )
2021-09-14 23:33:33 +02:00
Mike Fährmann
0f35aca728
[aryion] minor code updates
2021-05-19 23:46:33 +02:00
Mike Fährmann
2eb46452ad
[aryion] update 'needle' to not skip text posts ( fixes #1568 )
...
on "Latest Updates" pages
"class='thumb scrollthumb' href='/g4/view/" and
"class='thumb' href='/g4/view/" both end with
"thumb' href='/g4/view/"
2021-05-19 23:35:05 +02:00
Mike Fährmann
387fe415d5
unescape items in text.split_html()
2021-03-29 02:12:29 +02:00
Magnus Boman
522d0a834c
[aryion] Unescape paths too ( #1414 )
...
Without this you'll get paths like this:
- Starcross - Ch. 2 "The Ins and Outs of Sarah"
This commit changes it to:
- Starcross - Ch. 2 "The Ins and Outs of Sarah"
2021-03-27 18:25:38 +01:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
bc48514d84
[aryion] get post ID via gallery-item ( fixes #981 , closes #982 )
...
this even works when fetching post IDs from '/latest.php?id='
2020-09-06 22:17:23 +02:00
ArtaxIsSleeping
0e941553ec
[aryion] Add username/password support ( #960 )
...
* Add username/password support to aryion extractor
* Update docs to match
* Fix code style
2020-08-27 22:45:30 +02:00
Mike Fährmann
b2009ea39e
[aryion] update folder mime type list ( fixes #945 )
2020-08-16 22:30:15 +02:00
Mike Fährmann
f1ddbff0b5
[aryion] add 'recursive' option ( fixes #832 )
...
This is enabled by default and will recursively go through all
(sub)folders in an artist's gallery.
The old method of using "Latest Updates" lists can be restored by
disabling this option.
2020-06-26 23:36:50 +02:00
Mike Fährmann
db6685eeae
[aryion] support downloading from folders ( fixes #694 )
2020-04-18 01:25:54 +02:00
Mike Fährmann
cf4cef3d63
[aryion] adjust 'date' to UTC time
2020-04-11 02:08:05 +02:00
Mike Fährmann
6c531be294
[aryion] fix malformed 'last-modified' headers ( #390 )
2020-04-10 23:08:52 +02:00
Mike Fährmann
dc65f7d8dc
[aryion] use generic download URLs ( #390 )
...
i.e. /g4/data.php?id=…
- get filename & extension from Content-Disposition header
- handle all downloadable file types (docx, swf, etc)
2020-04-10 22:08:45 +02:00
Mike Fährmann
96b78bcf04
[aryion] include path in default directory format ( #390 )
2020-04-10 21:58:46 +02:00
Mike Fährmann
6143050980
[aryion] add gallery and post extractors ( #390 , #673 )
2020-04-08 21:52:51 +02:00