Commit Graph

6205 Commits

Author SHA1 Message Date
Mike Fährmann
4853406fe3 [common] allow MangaExtractors to skip loading manga_url 2025-01-10 21:30:58 +01:00
Mike Fährmann
af9c06f812 [bunkr] fix album extraction (#6798) 2025-01-10 13:01:04 +01:00
Mike Fährmann
118b994cf2 [bunkr] support '/f/...' media URLs 2025-01-10 13:01:04 +01:00
Mike Fährmann
ba0443115a [bunkr] fix ValueError on relative redirects (#6790) 2025-01-10 13:00:52 +01:00
Mike Fährmann
89276c5b3e [e621] match 'tag' search URLs with empty tag (#6783) 2025-01-07 20:00:26 +01:00
Mike Fährmann
d18f311fe2 [plurk] fix 'user' data extraction and make it non-fatal (#6742) 2025-01-06 20:27:37 +01:00
Mike Fährmann
b1ffb62644 [docs] update 'sleep-request' value for 'wallhaven' 2025-01-06 17:24:04 +01:00
Mike Fährmann
46b6b71159 [wallhaven] extract 'search[tags]' and 'search[tag_id]' metadata
(#6772)
2025-01-06 17:18:04 +01:00
Mike Fährmann
270aaea8ab [pixiv] provide fallback URLs (#6762) 2025-01-06 15:27:32 +01:00
Mike Fährmann
770f41eb4a [util] support not splitting "contains" value (#6773)
by passing any "false" value as 'separator' argument except None
2025-01-06 13:47:32 +01:00
Mike Fährmann
a3b9cc7785 [options] mark '--list-extractors' argument as optional 2025-01-05 21:37:44 +01:00
Mike Fährmann
7e8ca377fc release version 1.28.3 2025-01-04 16:42:02 +01:00
Mike Fährmann
107798eeab [subscribestar] strip whitespace from 'content' 2025-01-04 16:19:22 +01:00
Mike Fährmann
a53ce6103c [deviantart:tiptap] smaller fixes
- fix text indentation in headings
- fix deviations formats without 'c' path
- support custom 'target' in links
2025-01-03 22:48:06 +01:00
Mike Fährmann
1dcb40be7c merge #6760: [boosty] support 'file' post attachments (#2387)
https://github.com/mikf/gallery-dl/issues/2387#issuecomment-2564671646
2025-01-03 15:59:03 +01:00
Mike Fährmann
bce9be66c2 merge #6761: [subscribestar] improve 'content' metadata extraction 2025-01-03 15:56:17 +01:00
Wyoh Knott
22d4e84372 [subscribestar] Better extraction of content
The structure of content is like this:

```
<div class="post-content" data-role="post_content-text">
                <div class="trix-content">
                    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
                    <html>
                        <body>
                            <div>
                                Unspeakable thing are written here<br />
                                <br />
                                haiiiiiiiiiiiiiiii hi hi hiii its meee back againnn, plspls leave a comment if uuuu liked it mwah
                                &lt;3
                            </div>
                        </body>
                    </html>
                </div>
            </div>
            <div class="post-uploads
```

Currently we extract content with:

```
(extr('<div class="post-content', '<div class="post-uploads').partition(">")[2])
```

I propose we just take the body parts:

```
extr('<body>', '</body>')
```

which only happen when surrounding actual content.

It is then easier to use it in the filename content with the `!H`
formatter: `content[:160]!H}`. Otherwise the content currently extracted
can't be decoded with it.
2025-01-03 14:57:12 +01:00
Dominik
ea6594734d [boosty] Fixed formatting 2025-01-03 08:27:11 +01:00
Dominik
8c9221f0a6 [boosty] Added post attachment download 2025-01-03 08:18:57 +01:00
Mike Fährmann
5767c0854c merge #6758: [subscribestar] fix attachment downloads and add support for audio type
(#6721, #>6724)
2025-01-02 18:25:37 +01:00
Mike Fährmann
671297a8cc [subscribestar] extend fix + add test
some attachments are inside an element with an additional class besides
'doc_preview', e.g. 'class="doc_preview for_post"'
2025-01-02 18:22:15 +01:00
Mike Fährmann
2dd2c71c53 [docs] update configuration.rst 2025-01-02 17:54:47 +01:00
Mike Fährmann
428eb53086 [hitomi] provide 'search_tags' metadata for search/tag results
(#1015, #6756)
2025-01-02 17:49:30 +01:00
Mike Fährmann
0c584f9be7 [sankaku] support alphanumeric book/pool IDs (#6757) 2025-01-02 15:49:07 +01:00
Wyoh Knott
a46f7981ee [subscribestar] Fix attachment download and add support for audio type
- We change the text.extr 3rd argument to match current structure
   ('class="post-edit_form"')
 - We add support for uploads-audios based on a similar structure as the
   attachment type:
    - id = data-upload-id
    - name = audio_preview-title
    - url = src
    - type = audio

Fix #6721
2025-01-02 15:47:09 +01:00
Mike Fährmann
bd7320fb7d [deviantart:tiptap] support more content block types
- anchor
- blockquote
- da-gif
- da-video
- lists
    - listItem
    - orderedList
    - bulletList
- text indentation
2025-01-02 14:17:32 +01:00
Mike Fährmann
5c5b6d6276 [deviantart:tiptap] fix deviation embeds without 'token' 2024-12-28 19:47:05 +01:00
Mike Fährmann
7391dd208c [poipiku] always query 'ShowAppendFileF' when post has warning (#6736) 2024-12-27 20:32:50 +01:00
Mike Fährmann
bc7e95684d [piczel] fix extraction (#6735)
- fix pagination
- update API endpoints
- provide 'count' metadata field
- use BASE_PATTERN and self.groups[…]
2024-12-27 15:08:08 +01:00
Mike Fährmann
167a726972 [szurubooru] support 'visuabusters.com/booru' (#6729) 2024-12-26 19:04:16 +01:00
Mike Fährmann
998f949db1 [civitai] add 'user-videos' extractor (#6644) 2024-12-26 10:18:54 +01:00
Mike Fährmann
c6d5e25055 [workflows:executables] use Python 3.13 2024-12-25 19:50:26 +01:00
Mike Fährmann
99de0e1867 [instagram] fix 'pinned' values for '/reels' results (#6719) 2024-12-25 19:42:50 +01:00
Mike Fährmann
3024dce06b [8muses] skip albums without valid 'permalink' (#6717) 2024-12-24 13:49:19 +01:00
Mike Fährmann
09b2f8ea9e [batoto] update domains (#6714)
- support 'fto.to' and 'jto.to'
- use 'xbato.org' for deprecated domains
2024-12-24 09:38:07 +01:00
Mike Fährmann
f9d3603bfc [hitomi] fix searches (#6713) 2024-12-24 09:36:29 +01:00
Mike Fährmann
a3fb03c943 [release] ensure executables have a minimum size
to prevent issues like #6699 from happening again
2024-12-23 16:07:41 +01:00
Mike Fährmann
081856b9ce [kemonoparty] handle 'discord' favorites (#6706) 2024-12-22 18:56:21 +01:00
Mike Fährmann
de9442ba75 [directlink] use domain as 'subcategory' (#6703) 2024-12-22 17:19:56 +01:00
Mike Fährmann
18491a4ce6 [tapas] fix TypeError for locked episodes (#6700) 2024-12-21 15:17:51 +01:00
Mike Fährmann
454f766f5e release version 1.28.2 2024-12-20 19:13:42 +01:00
Mike Fährmann
6059ffccf8 [deviantart] improve 'tiptap' to HTML conversion (#6686)
- fix "KeyError: 'attrs'" for links without 'href'
- support 'strike' text markers
- support 'heading' content blocks
2024-12-20 16:45:19 +01:00
Mike Fährmann
e0514817bd [saint] support 'saint2.cr' URLs (#6692) 2024-12-19 11:43:35 +01:00
Mike Fährmann
8fbcdc1a3d [instagram] extract 'date' for stories (#6677)
generalize 'date' extraction for all post types
2024-12-18 16:33:21 +01:00
Mike Fährmann
fd5869f7df [bilibili] support '/upload/opus' URLs (#6687) 2024-12-18 08:53:27 +01:00
Mike Fährmann
5fbd0c3a63 [bilibili] extract files from 'module_top' entries (#6687) 2024-12-18 08:45:29 +01:00
Mike Fährmann
041baf8441 [common] compute and use latest Firefox UA
instead of the latest ESR UA
2024-12-17 22:20:37 +01:00
Mike Fährmann
0802e42c90 [common] use random unused port for '"user-agent": "browser"' 2024-12-17 21:40:20 +01:00
Mike Fährmann
9f3e4511c6 [tapas] restructure extractors (#6680)
- handle all episodes with TapasEpisodeExtractor
- prevent locked episodes from stopping processing of all following
  episodes
2024-12-17 21:36:37 +01:00
Mike Fährmann
5ab2ae17bc support wildcards for parent>child categories (#6673)
For example "reddit>*" for all reddit child extractors
2024-12-16 08:50:18 +01:00