gallery-dl

Author	SHA1	Message	Date
Mike Fährmann	2be54be692	[subscribestar] merge 'user-tag' into regular 'user' extractor (#8737 )	2025-12-23 18:58:25 +01:00
Mike Fährmann	7669a1f13a	[subscribestar:user-tag] update 'pattern'	2025-12-22 11:43:30 +01:00
Mike Fährmann	b5a7540619	Merge branch 'op+': use '+' for 2-element string concatenations	2025-12-22 11:34:21 +01:00
Mike Fährmann	00c6821a3f	replace 2-element f-strings with simple '+' concatenations Python's 'ast' module and its 'NodeVisitor' class were incredibly helpful in identifying these	2025-12-22 11:26:04 +01:00
Mike Fährmann	609e19273d	[subscribestar] add 'user-tag' extractor (#8737 )	2025-12-21 22:14:17 +01:00
Mike Fährmann	e006d26c8e	Revert "use f-strings when building 'pattern'" revert `d7c97d5a97`.	2025-12-20 22:07:37 +01:00
Mike Fährmann	968597a302	yield 3-tuples for Message.Directory adapt tuples to the same length and semantics as other messages	2025-12-05 21:39:52 +01:00
Mike Fährmann	d7c97d5a97	use f-strings when building 'pattern'	2025-10-20 21:23:11 +02:00
Mike Fährmann	9bf76c1352	replace 'util.re()' with 'text.re()' remove unnecessary 'util' imports	2025-10-20 17:44:58 +02:00
Mike Fährmann	c8fc790028	merge branch 'dt': move datetime utils into separate module - use 'datetime.fromisoformat()' when possible (#7671) - return a datetime-compatible object for invalid datetimes (instead of a 'str' value)	2025-10-20 09:30:05 +02:00
Mike Fährmann	085616e0a8	[dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()'	2025-10-17 17:43:06 +02:00
Mike Fährmann	36a3fe45e4	[subscribestar] improve 'filename' (#8416 )	2025-10-15 11:52:39 +02:00
Mike Fährmann	a097a373a9	simplify if statements by using walrus operators (#7671 )	2025-07-22 20:57:54 +02:00
Mike Fährmann	d8ef1d693f	rename 'StopExtraction' to 'AbortExtraction' for cases where StopExtraction was used to report errors	2025-07-09 21:07:28 +02:00
Mike Fährmann	f2a72d8d1e	replace 'request(…).json()' with 'request_json(…)'	2025-06-29 17:50:19 +02:00
Mike Fährmann	9dbe33b6de	replace old %-formatted and .format(…) strings with f-strings (#7671 ) mostly using flynt https://github.com/ikamensh/flynt	2025-06-29 17:50:19 +02:00
Mike Fährmann	e08ec7e083	update copyright notices	2025-06-13 00:03:41 +02:00
Mike Fährmann	b5c88b3d3e	replace standard library 're' uses with 'util.re()'	2025-06-06 13:24:52 +02:00
Mike Fährmann	b81fc5c124	replace text.rextract() with rextr()	2025-05-23 18:28:58 +02:00
Mike Fährmann	311eaf5f11	[subscribestar] fix 'title' extraction for 'trix-attachment' posts (#7526 )	2025-05-16 19:09:37 +02:00
Mike Fährmann	98fdcd4d72	[subscribestar] fix 'content' extraction (#7486 ) and extract 'tags' metadata Authored by: prowlguru Co-authored-by: prowlguru <183935626+prowlguru@users.noreply.github.com>	2025-05-10 21:04:27 +02:00
Mike Fährmann	78b34bbdd7	[subscribestar] fix username & password login	2025-04-25 20:15:00 +02:00
Mike Fährmann	8b7f5eacbb	[subscribestar] add warning for missing login cookie and update expected cookie domains and names	2025-04-25 16:20:02 +02:00
Mike Fährmann	af57ab3233	[subscribestar] detect redirects to '/age_confirmation_warning' pages	2025-03-22 11:42:50 +01:00
Mike Fährmann	4807bc215c	[subscribestar] extract 'title' metadata (#7219 )	2025-03-22 09:46:08 +01:00
Mike Fährmann	79dc04d87c	[subscribestar] fix 'post' extractor (#6582 ) https://github.com/mikf/gallery-dl/issues/6582#issuecomment-2675939669	2025-02-22 10:08:59 +01:00
Mike Fährmann	7c96c2368f	[subscribestar] detect and handle redirects (#6916 )	2025-02-01 21:03:24 +01:00
Mike Fährmann	107798eeab	[subscribestar] strip whitespace from 'content'	2025-01-04 16:19:22 +01:00
Wyoh Knott	22d4e84372	[subscribestar] Better extraction of content The structure of content is like this: ``` <div class="post-content" data-role="post_content-text"> <div class="trix-content"> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html> <body> <div> Unspeakable thing are written here<br /> <br /> haiiiiiiiiiiiiiiii hi hi hiii its meee back againnn, plspls leave a comment if uuuu liked it mwah <3 </div> </body> </html> </div> </div> <div class="post-uploads ``` Currently we extract content with: ``` (extr('<div class="post-content', '<div class="post-uploads').partition(">")[2]) ``` I propose we just take the body parts: ``` extr('<body>', '</body>') ``` which only happen when surrounding actual content. It is then easier to use it in the filename content with the `!H` formatter: `content[:160]!H}`. Otherwise the content currently extracted can't be decoded with it.	2025-01-03 14:57:12 +01:00
Mike Fährmann	671297a8cc	[subscribestar] extend fix + add test some attachments are inside an element with an additional class besides 'doc_preview', e.g. 'class="doc_preview for_post"'	2025-01-02 18:22:15 +01:00
Wyoh Knott	a46f7981ee	[subscribestar] Fix attachment download and add support for audio type - We change the text.extr 3rd argument to match current structure ('class="post-edit_form"') - We add support for uploads-audios based on a similar structure as the attachment type: - id = data-upload-id - name = audio_preview-title - url = src - type = audio Fix #6721	2025-01-02 15:47:09 +01:00
Arased	03486599af	Fix subscribestar date parsing in udated posts	2024-06-24 16:40:59 +02:00
Mike Fährmann	ea434963ae	[subscribestar] fix file URLs (#5631 )	2024-05-23 19:12:01 +02:00
Mike Fährmann	1b34d5ac40	[subscribestar] fix 'date' metadata	2024-03-22 00:45:09 +01:00
Mike Fährmann	57fc6fcf83	replace '24*3600' with '86400' and generalize cache maxage values	2023-12-18 23:57:22 +01:00
Mike Fährmann	a453335a9f	remove test results in extractor modules and add generic example URLs	2023-09-11 16:30:55 +02:00
Mike Fährmann	f856987297	[subscribestar] fix preview detection (#4468 ) and show a warning message when posts contain previews	2023-09-04 22:21:14 +02:00
Mike Fährmann	d97b8c2fba	consistent cookie-related names - rename every cookie variable or method to 'cookies_*' - simplify '.session.cookies' to just '.cookies' - more consistent 'login()' structure	2023-07-22 01:20:50 +02:00
Mike Fährmann	dd884b02ee	replace json.loads with direct calls to JSONDecoder.decode	2023-02-09 15:22:00 +01:00
Mike Fährmann	b0cb4a1b9c	replace 'text.extract()' with 'text.extr()' where possible	2022-11-05 01:14:09 +01:00
Mike Fährmann	541a61d344	[subscribestar] fix 'date' metadata (#2642 ) Handle instances where the actual datetime information is preceded by "Updated on "	2022-06-04 12:24:08 +02:00
Mike Fährmann	d50a1ec2cc	[subscribestar] unescape attachment URLs (fixes #2370 )	2022-03-09 19:06:04 +01:00
Mike Fährmann	522782c09d	[subscribestar] emit metadata for posts without media (#1569 )	2021-11-18 23:42:17 +01:00
Mike Fährmann	1c8aaf9318	[subscribestar] add 'num' enumeration index (closes #2040 )	2021-11-18 23:38:41 +01:00
Mike Fährmann	21c2da454f	update extractor test results	2021-07-04 22:00:32 +02:00
Mike Fährmann	d09bc5bd34	[subscribestar] improve attachment filenames (#1609 )	2021-06-10 17:09:13 +02:00
Mike Fährmann	968d3e8465	remove '&' from URL patterns '/?&#' -> '/?#' and '?&#' -> '?#' According to https://www.ietf.org/rfc/rfc3986.txt, URLs are "organized hierarchically" by using "the slash ("/"), question mark ("?"), and number sign ("#") characters to delimit components"	2020-10-22 23:31:25 +02:00
Mike Fährmann	69e4871005	update extractor test results - sensescans: replace 404d chapters - mangapark: replace 404d chapters - subscribestar: update test for attached files	2020-08-28 22:32:32 +02:00
Mike Fährmann	0d84d3af55	[subscribestar] extract attached media files (#852 )	2020-08-03 22:02:42 +02:00
Mike Fährmann	e50c75628c	[subscribestar] update 'date' parsing	2020-07-24 22:27:36 +02:00

1 2

53 Commits