Mike Fährmann
2be54be692
[subscribestar] merge 'user-tag' into regular 'user' extractor ( #8737 )
2025-12-23 18:58:25 +01:00
Mike Fährmann
7669a1f13a
[subscribestar:user-tag] update 'pattern'
2025-12-22 11:43:30 +01:00
Mike Fährmann
b5a7540619
Merge branch 'op+': use '+' for 2-element string concatenations
2025-12-22 11:34:21 +01:00
Mike Fährmann
00c6821a3f
replace 2-element f-strings with simple '+' concatenations
...
Python's 'ast' module and its 'NodeVisitor' class
were incredibly helpful in identifying these
2025-12-22 11:26:04 +01:00
Mike Fährmann
609e19273d
[subscribestar] add 'user-tag' extractor ( #8737 )
2025-12-21 22:14:17 +01:00
Mike Fährmann
e006d26c8e
Revert "use f-strings when building 'pattern'"
...
revert d7c97d5a97 .
2025-12-20 22:07:37 +01:00
Mike Fährmann
968597a302
yield 3-tuples for Message.Directory
...
adapt tuples to the same length and semantics as other messages
2025-12-05 21:39:52 +01:00
Mike Fährmann
d7c97d5a97
use f-strings when building 'pattern'
2025-10-20 21:23:11 +02:00
Mike Fährmann
9bf76c1352
replace 'util.re()' with 'text.re()'
...
remove unnecessary 'util' imports
2025-10-20 17:44:58 +02:00
Mike Fährmann
c8fc790028
merge branch 'dt': move datetime utils into separate module
...
- use 'datetime.fromisoformat()' when possible (#7671 )
- return a datetime-compatible object for invalid datetimes
(instead of a 'str' value)
2025-10-20 09:30:05 +02:00
Mike Fährmann
085616e0a8
[dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()'
2025-10-17 17:43:06 +02:00
Mike Fährmann
36a3fe45e4
[subscribestar] improve 'filename' ( #8416 )
2025-10-15 11:52:39 +02:00
Mike Fährmann
a097a373a9
simplify if statements by using walrus operators ( #7671 )
2025-07-22 20:57:54 +02:00
Mike Fährmann
d8ef1d693f
rename 'StopExtraction' to 'AbortExtraction'
...
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
Mike Fährmann
f2a72d8d1e
replace 'request(…).json()' with 'request_json(…)'
2025-06-29 17:50:19 +02:00
Mike Fährmann
9dbe33b6de
replace old %-formatted and .format(…) strings with f-strings ( #7671 )
...
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
e08ec7e083
update copyright notices
2025-06-13 00:03:41 +02:00
Mike Fährmann
b5c88b3d3e
replace standard library 're' uses with 'util.re()'
2025-06-06 13:24:52 +02:00
Mike Fährmann
b81fc5c124
replace text.rextract() with rextr()
2025-05-23 18:28:58 +02:00
Mike Fährmann
311eaf5f11
[subscribestar] fix 'title' extraction for 'trix-attachment' posts ( #7526 )
2025-05-16 19:09:37 +02:00
Mike Fährmann
98fdcd4d72
[subscribestar] fix 'content' extraction ( #7486 )
...
and extract 'tags' metadata
Authored by: prowlguru
Co-authored-by: prowlguru <183935626+prowlguru@users.noreply.github.com >
2025-05-10 21:04:27 +02:00
Mike Fährmann
78b34bbdd7
[subscribestar] fix username & password login
2025-04-25 20:15:00 +02:00
Mike Fährmann
8b7f5eacbb
[subscribestar] add warning for missing login cookie
...
and update expected cookie domains and names
2025-04-25 16:20:02 +02:00
Mike Fährmann
af57ab3233
[subscribestar] detect redirects to '/age_confirmation_warning' pages
2025-03-22 11:42:50 +01:00
Mike Fährmann
4807bc215c
[subscribestar] extract 'title' metadata ( #7219 )
2025-03-22 09:46:08 +01:00
Mike Fährmann
79dc04d87c
[subscribestar] fix 'post' extractor ( #6582 )
...
https://github.com/mikf/gallery-dl/issues/6582#issuecomment-2675939669
2025-02-22 10:08:59 +01:00
Mike Fährmann
7c96c2368f
[subscribestar] detect and handle redirects ( #6916 )
2025-02-01 21:03:24 +01:00
Mike Fährmann
107798eeab
[subscribestar] strip whitespace from 'content'
2025-01-04 16:19:22 +01:00
Wyoh Knott
22d4e84372
[subscribestar] Better extraction of content
...
The structure of content is like this:
```
<div class="post-content" data-role="post_content-text">
<div class="trix-content">
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd ">
<html>
<body>
<div>
Unspeakable thing are written here<br />
<br />
haiiiiiiiiiiiiiiii hi hi hiii its meee back againnn, plspls leave a comment if uuuu liked it mwah
<3
</div>
</body>
</html>
</div>
</div>
<div class="post-uploads
```
Currently we extract content with:
```
(extr('<div class="post-content', '<div class="post-uploads').partition(">")[2])
```
I propose we just take the body parts:
```
extr('<body>', '</body>')
```
which only happen when surrounding actual content.
It is then easier to use it in the filename content with the `!H`
formatter: `content[:160]!H}`. Otherwise the content currently extracted
can't be decoded with it.
2025-01-03 14:57:12 +01:00
Mike Fährmann
671297a8cc
[subscribestar] extend fix + add test
...
some attachments are inside an element with an additional class besides
'doc_preview', e.g. 'class="doc_preview for_post"'
2025-01-02 18:22:15 +01:00
Wyoh Knott
a46f7981ee
[subscribestar] Fix attachment download and add support for audio type
...
- We change the text.extr 3rd argument to match current structure
('class="post-edit_form"')
- We add support for uploads-audios based on a similar structure as the
attachment type:
- id = data-upload-id
- name = audio_preview-title
- url = src
- type = audio
Fix #6721
2025-01-02 15:47:09 +01:00
Arased
03486599af
Fix subscribestar date parsing in udated posts
2024-06-24 16:40:59 +02:00
Mike Fährmann
ea434963ae
[subscribestar] fix file URLs ( #5631 )
2024-05-23 19:12:01 +02:00
Mike Fährmann
1b34d5ac40
[subscribestar] fix 'date' metadata
2024-03-22 00:45:09 +01:00
Mike Fährmann
57fc6fcf83
replace '24*3600' with '86400'
...
and generalize cache maxage values
2023-12-18 23:57:22 +01:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
f856987297
[subscribestar] fix preview detection ( #4468 )
...
and show a warning message when posts contain previews
2023-09-04 22:21:14 +02:00
Mike Fährmann
d97b8c2fba
consistent cookie-related names
...
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
dd884b02ee
replace json.loads with direct calls to JSONDecoder.decode
2023-02-09 15:22:00 +01:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
541a61d344
[subscribestar] fix 'date' metadata ( #2642 )
...
Handle instances where the actual datetime information
is preceded by "Updated on "
2022-06-04 12:24:08 +02:00
Mike Fährmann
d50a1ec2cc
[subscribestar] unescape attachment URLs ( fixes #2370 )
2022-03-09 19:06:04 +01:00
Mike Fährmann
522782c09d
[subscribestar] emit metadata for posts without media ( #1569 )
2021-11-18 23:42:17 +01:00
Mike Fährmann
1c8aaf9318
[subscribestar] add 'num' enumeration index ( closes #2040 )
2021-11-18 23:38:41 +01:00
Mike Fährmann
21c2da454f
update extractor test results
2021-07-04 22:00:32 +02:00
Mike Fährmann
d09bc5bd34
[subscribestar] improve attachment filenames ( #1609 )
2021-06-10 17:09:13 +02:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
69e4871005
update extractor test results
...
- sensescans: replace 404d chapters
- mangapark: replace 404d chapters
- subscribestar: update test for attached files
2020-08-28 22:32:32 +02:00
Mike Fährmann
0d84d3af55
[subscribestar] extract attached media files ( #852 )
2020-08-03 22:02:42 +02:00
Mike Fährmann
e50c75628c
[subscribestar] update 'date' parsing
2020-07-24 22:27:36 +02:00