Commit Graph

30 Commits

Author SHA1 Message Date
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
b81fc5c124 replace text.rextract() with rextr() 2025-05-23 18:28:58 +02:00
Mike Fährmann
b76e7de1a7 [dl:http] fix setting 'mtime' per file (#7529)
introduce '_http_lastmodified' meta field
2025-05-21 13:50:51 +02:00
Mike Fährmann
156a70bec0 [aryion] update favorite extractor
- add test case
- add docs/supportedsites entry
- add custom directory_fmt and archive_fmt
- remove constructor
- appease flake8
2024-07-21 12:34:06 +02:00
walkenjoyer
19e98ef8e9 [aryion] Add favorite extractor (#4511) 2024-07-20 18:49:59 +02:00
Mike Fährmann
57fc6fcf83 replace '24*3600' with '86400'
and generalize cache maxage values
2023-12-18 23:57:22 +01:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
b03ca7f10c [aryion] provide correct 'date' independent of dst 2022-03-24 22:57:18 +01:00
Mike Fährmann
4b3e309b90 [aryion] update/improve pagination (#1849)
Manually increment the 'p' query parameter,
instead of relying on a "Next" link which only works up to page 200.
2021-09-16 16:27:25 +02:00
Mike Fährmann
266ed9b62e [aryion] add 'tag' extractor (closes #1849) 2021-09-14 23:33:33 +02:00
Mike Fährmann
0f35aca728 [aryion] minor code updates 2021-05-19 23:46:33 +02:00
Mike Fährmann
2eb46452ad [aryion] update 'needle' to not skip text posts (fixes #1568)
on "Latest Updates" pages

"class='thumb scrollthumb' href='/g4/view/" and
"class='thumb' href='/g4/view/" both end with
"thumb' href='/g4/view/"
2021-05-19 23:35:05 +02:00
Mike Fährmann
387fe415d5 unescape items in text.split_html() 2021-03-29 02:12:29 +02:00
Magnus Boman
522d0a834c [aryion] Unescape paths too (#1414)
Without this you'll get paths like this:
  - Starcross - Ch. 2 "The Ins and Outs of Sarah"

This commit changes it to:
  - Starcross - Ch. 2 "The Ins and Outs of Sarah"
2021-03-27 18:25:38 +01:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
bc48514d84 [aryion] get post ID via gallery-item (fixes #981, closes #982)
this even works when fetching post IDs from '/latest.php?id='
2020-09-06 22:17:23 +02:00
ArtaxIsSleeping
0e941553ec [aryion] Add username/password support (#960)
* Add username/password support to aryion extractor

* Update docs to match

* Fix code style
2020-08-27 22:45:30 +02:00
Mike Fährmann
b2009ea39e [aryion] update folder mime type list (fixes #945) 2020-08-16 22:30:15 +02:00
Mike Fährmann
f1ddbff0b5 [aryion] add 'recursive' option (fixes #832)
This is enabled by default and will recursively go through all
(sub)folders in an artist's gallery.

The old method of using "Latest Updates" lists can be restored by
disabling this option.
2020-06-26 23:36:50 +02:00
Mike Fährmann
db6685eeae [aryion] support downloading from folders (fixes #694) 2020-04-18 01:25:54 +02:00
Mike Fährmann
cf4cef3d63 [aryion] adjust 'date' to UTC time 2020-04-11 02:08:05 +02:00
Mike Fährmann
6c531be294 [aryion] fix malformed 'last-modified' headers (#390) 2020-04-10 23:08:52 +02:00
Mike Fährmann
dc65f7d8dc [aryion] use generic download URLs (#390)
i.e. /g4/data.php?id=…

- get filename & extension from Content-Disposition header
- handle all downloadable file types (docx, swf, etc)
2020-04-10 22:08:45 +02:00
Mike Fährmann
96b78bcf04 [aryion] include path in default directory format (#390) 2020-04-10 21:58:46 +02:00
Mike Fährmann
6143050980 [aryion] add gallery and post extractors (#390, #673) 2020-04-08 21:52:51 +02:00