Mike Fährmann
4853406fe3
[common] allow MangaExtractors to skip loading manga_url
2025-01-10 21:30:58 +01:00
Mike Fährmann
af9c06f812
[bunkr] fix album extraction ( #6798 )
2025-01-10 13:01:04 +01:00
Mike Fährmann
118b994cf2
[bunkr] support '/f/...' media URLs
2025-01-10 13:01:04 +01:00
Mike Fährmann
ba0443115a
[bunkr] fix ValueError on relative redirects ( #6790 )
2025-01-10 13:00:52 +01:00
Mike Fährmann
89276c5b3e
[e621] match 'tag' search URLs with empty tag ( #6783 )
2025-01-07 20:00:26 +01:00
Mike Fährmann
d18f311fe2
[plurk] fix 'user' data extraction and make it non-fatal ( #6742 )
2025-01-06 20:27:37 +01:00
Mike Fährmann
b1ffb62644
[docs] update 'sleep-request' value for 'wallhaven'
2025-01-06 17:24:04 +01:00
Mike Fährmann
46b6b71159
[wallhaven] extract 'search[tags]' and 'search[tag_id]' metadata
...
(#6772 )
2025-01-06 17:18:04 +01:00
Mike Fährmann
270aaea8ab
[pixiv] provide fallback URLs ( #6762 )
2025-01-06 15:27:32 +01:00
Mike Fährmann
770f41eb4a
[util] support not splitting "contains" value ( #6773 )
...
by passing any "false" value as 'separator' argument except None
2025-01-06 13:47:32 +01:00
Mike Fährmann
a3b9cc7785
[options] mark '--list-extractors' argument as optional
2025-01-05 21:37:44 +01:00
Mike Fährmann
7e8ca377fc
release version 1.28.3
2025-01-04 16:42:02 +01:00
Mike Fährmann
107798eeab
[subscribestar] strip whitespace from 'content'
2025-01-04 16:19:22 +01:00
Mike Fährmann
a53ce6103c
[deviantart:tiptap] smaller fixes
...
- fix text indentation in headings
- fix deviations formats without 'c' path
- support custom 'target' in links
2025-01-03 22:48:06 +01:00
Mike Fährmann
1dcb40be7c
merge #6760 : [boosty] support 'file' post attachments ( #2387 )
...
https://github.com/mikf/gallery-dl/issues/2387#issuecomment-2564671646
2025-01-03 15:59:03 +01:00
Mike Fährmann
bce9be66c2
merge #6761 : [subscribestar] improve 'content' metadata extraction
2025-01-03 15:56:17 +01:00
Wyoh Knott
22d4e84372
[subscribestar] Better extraction of content
...
The structure of content is like this:
```
<div class="post-content" data-role="post_content-text">
<div class="trix-content">
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd ">
<html>
<body>
<div>
Unspeakable thing are written here<br />
<br />
haiiiiiiiiiiiiiiii hi hi hiii its meee back againnn, plspls leave a comment if uuuu liked it mwah
<3
</div>
</body>
</html>
</div>
</div>
<div class="post-uploads
```
Currently we extract content with:
```
(extr('<div class="post-content', '<div class="post-uploads').partition(">")[2])
```
I propose we just take the body parts:
```
extr('<body>', '</body>')
```
which only happen when surrounding actual content.
It is then easier to use it in the filename content with the `!H`
formatter: `content[:160]!H}`. Otherwise the content currently extracted
can't be decoded with it.
2025-01-03 14:57:12 +01:00
Dominik
ea6594734d
[boosty] Fixed formatting
2025-01-03 08:27:11 +01:00
Dominik
8c9221f0a6
[boosty] Added post attachment download
2025-01-03 08:18:57 +01:00
Mike Fährmann
5767c0854c
merge #6758 : [subscribestar] fix attachment downloads and add support for audio type
...
(#6721 , #>6724)
2025-01-02 18:25:37 +01:00
Mike Fährmann
671297a8cc
[subscribestar] extend fix + add test
...
some attachments are inside an element with an additional class besides
'doc_preview', e.g. 'class="doc_preview for_post"'
2025-01-02 18:22:15 +01:00
Mike Fährmann
2dd2c71c53
[docs] update configuration.rst
2025-01-02 17:54:47 +01:00
Mike Fährmann
428eb53086
[hitomi] provide 'search_tags' metadata for search/tag results
...
(#1015 , #6756 )
2025-01-02 17:49:30 +01:00
Mike Fährmann
0c584f9be7
[sankaku] support alphanumeric book/pool IDs ( #6757 )
2025-01-02 15:49:07 +01:00
Wyoh Knott
a46f7981ee
[subscribestar] Fix attachment download and add support for audio type
...
- We change the text.extr 3rd argument to match current structure
('class="post-edit_form"')
- We add support for uploads-audios based on a similar structure as the
attachment type:
- id = data-upload-id
- name = audio_preview-title
- url = src
- type = audio
Fix #6721
2025-01-02 15:47:09 +01:00
Mike Fährmann
bd7320fb7d
[deviantart:tiptap] support more content block types
...
- anchor
- blockquote
- da-gif
- da-video
- lists
- listItem
- orderedList
- bulletList
- text indentation
2025-01-02 14:17:32 +01:00
Mike Fährmann
5c5b6d6276
[deviantart:tiptap] fix deviation embeds without 'token'
2024-12-28 19:47:05 +01:00
Mike Fährmann
7391dd208c
[poipiku] always query 'ShowAppendFileF' when post has warning ( #6736 )
2024-12-27 20:32:50 +01:00
Mike Fährmann
bc7e95684d
[piczel] fix extraction ( #6735 )
...
- fix pagination
- update API endpoints
- provide 'count' metadata field
- use BASE_PATTERN and self.groups[…]
2024-12-27 15:08:08 +01:00
Mike Fährmann
167a726972
[szurubooru] support 'visuabusters.com/booru' ( #6729 )
2024-12-26 19:04:16 +01:00
Mike Fährmann
998f949db1
[civitai] add 'user-videos' extractor ( #6644 )
2024-12-26 10:18:54 +01:00
Mike Fährmann
c6d5e25055
[workflows:executables] use Python 3.13
2024-12-25 19:50:26 +01:00
Mike Fährmann
99de0e1867
[instagram] fix 'pinned' values for '/reels' results ( #6719 )
2024-12-25 19:42:50 +01:00
Mike Fährmann
3024dce06b
[8muses] skip albums without valid 'permalink' ( #6717 )
2024-12-24 13:49:19 +01:00
Mike Fährmann
09b2f8ea9e
[batoto] update domains ( #6714 )
...
- support 'fto.to' and 'jto.to'
- use 'xbato.org' for deprecated domains
2024-12-24 09:38:07 +01:00
Mike Fährmann
f9d3603bfc
[hitomi] fix searches ( #6713 )
2024-12-24 09:36:29 +01:00
Mike Fährmann
a3fb03c943
[release] ensure executables have a minimum size
...
to prevent issues like #6699 from happening again
2024-12-23 16:07:41 +01:00
Mike Fährmann
081856b9ce
[kemonoparty] handle 'discord' favorites ( #6706 )
2024-12-22 18:56:21 +01:00
Mike Fährmann
de9442ba75
[directlink] use domain as 'subcategory' ( #6703 )
2024-12-22 17:19:56 +01:00
Mike Fährmann
18491a4ce6
[tapas] fix TypeError for locked episodes ( #6700 )
2024-12-21 15:17:51 +01:00
Mike Fährmann
454f766f5e
release version 1.28.2
2024-12-20 19:13:42 +01:00
Mike Fährmann
6059ffccf8
[deviantart] improve 'tiptap' to HTML conversion ( #6686 )
...
- fix "KeyError: 'attrs'" for links without 'href'
- support 'strike' text markers
- support 'heading' content blocks
2024-12-20 16:45:19 +01:00
Mike Fährmann
e0514817bd
[saint] support 'saint2.cr' URLs ( #6692 )
2024-12-19 11:43:35 +01:00
Mike Fährmann
8fbcdc1a3d
[instagram] extract 'date' for stories ( #6677 )
...
generalize 'date' extraction for all post types
2024-12-18 16:33:21 +01:00
Mike Fährmann
fd5869f7df
[bilibili] support '/upload/opus' URLs ( #6687 )
2024-12-18 08:53:27 +01:00
Mike Fährmann
5fbd0c3a63
[bilibili] extract files from 'module_top' entries ( #6687 )
2024-12-18 08:45:29 +01:00
Mike Fährmann
041baf8441
[common] compute and use latest Firefox UA
...
instead of the latest ESR UA
2024-12-17 22:20:37 +01:00
Mike Fährmann
0802e42c90
[common] use random unused port for '"user-agent": "browser"'
2024-12-17 21:40:20 +01:00
Mike Fährmann
9f3e4511c6
[tapas] restructure extractors ( #6680 )
...
- handle all episodes with TapasEpisodeExtractor
- prevent locked episodes from stopping processing of all following
episodes
2024-12-17 21:36:37 +01:00
Mike Fährmann
5ab2ae17bc
support wildcards for parent>child categories ( #6673 )
...
For example "reddit>*" for all reddit child extractors
2024-12-16 08:50:18 +01:00