Commit Graph

6221 Commits

Author SHA1 Message Date
Mike Fährmann
438c61601b [xfolio] add initial support (#5514, #6351, #6837) 2025-01-18 15:57:56 +01:00
Mike Fährmann
dc7b46be21 [khinsider] add 'covers' option (#6844) 2025-01-18 15:57:56 +01:00
Mike Fährmann
5a31a2ad22 [khinsider] extract more 'album' metadata (#6844)
- year
- catalog
- developer
- publisher
- uploader
2025-01-18 15:57:55 +01:00
Mike Fährmann
3849b3fa92 [batoto] use 'chapter_id' in default archive IDs (#6835)
instead of '{chapter}{chapter_minor}' since some chapters have no actual
chapter number and end up as '0', potentially causing ID overlap
2025-01-15 14:52:18 +01:00
Mike Fährmann
6e919a3695 [e621] support e621.cc and e621.anthro.fr frontend URLs (#6809) 2025-01-15 14:35:37 +01:00
Mike Fährmann
843a39a6c6 [bunkr] extract correct 'filename' data (#6824) 2025-01-14 19:45:48 +01:00
Mike Fährmann
d17a423245 [xhamster] fix 'gallery' extractor (#6818) 2025-01-13 18:58:08 +01:00
Mike Fährmann
bde99cc6ce [cohost] remove module
cohost.org  now redirects to archive.org
2025-01-13 14:38:35 +01:00
Mike Fährmann
42070240ae [tests] allow testing for types + values 2025-01-12 20:55:37 +01:00
Mike Fährmann
2b46b82f9c [release] prevent overwriting ${CHANGELOG}.orig with truncated file
to avoid deleting most of CHANGELOG.md by accident when the release.sh
script gets interrupted halfway through, as happened during the v1.28.3
release in commit 7e8ca377fc
2025-01-12 18:05:35 +01:00
Mike Fährmann
6e3f51a05e release version 1.28.4 2025-01-12 17:22:09 +01:00
Mike Fährmann
91bd3e37f2 [pexels] add support (#2286, #4214, #6769) 2025-01-12 16:50:12 +01:00
Mike Fährmann
1ae3ac5e39 [common] add '_extract_nextdata' method 2025-01-12 11:48:36 +01:00
Mike Fährmann
3f48e2f820 [common] add '_extract_jsonld' method (#5272) 2025-01-12 11:07:48 +01:00
Mike Fährmann
88f1ef7c3c [bunkr] fix metadata extraction (#6805) 2025-01-11 12:48:41 +01:00
Mike Fährmann
1d75c8308c [weebcentral] add support (#6778) 2025-01-10 23:04:51 +01:00
Mike Fährmann
4853406fe3 [common] allow MangaExtractors to skip loading manga_url 2025-01-10 21:30:58 +01:00
Mike Fährmann
af9c06f812 [bunkr] fix album extraction (#6798) 2025-01-10 13:01:04 +01:00
Mike Fährmann
118b994cf2 [bunkr] support '/f/...' media URLs 2025-01-10 13:01:04 +01:00
Mike Fährmann
ba0443115a [bunkr] fix ValueError on relative redirects (#6790) 2025-01-10 13:00:52 +01:00
Mike Fährmann
89276c5b3e [e621] match 'tag' search URLs with empty tag (#6783) 2025-01-07 20:00:26 +01:00
Mike Fährmann
d18f311fe2 [plurk] fix 'user' data extraction and make it non-fatal (#6742) 2025-01-06 20:27:37 +01:00
Mike Fährmann
b1ffb62644 [docs] update 'sleep-request' value for 'wallhaven' 2025-01-06 17:24:04 +01:00
Mike Fährmann
46b6b71159 [wallhaven] extract 'search[tags]' and 'search[tag_id]' metadata
(#6772)
2025-01-06 17:18:04 +01:00
Mike Fährmann
270aaea8ab [pixiv] provide fallback URLs (#6762) 2025-01-06 15:27:32 +01:00
Mike Fährmann
770f41eb4a [util] support not splitting "contains" value (#6773)
by passing any "false" value as 'separator' argument except None
2025-01-06 13:47:32 +01:00
Mike Fährmann
a3b9cc7785 [options] mark '--list-extractors' argument as optional 2025-01-05 21:37:44 +01:00
Mike Fährmann
7e8ca377fc release version 1.28.3 2025-01-04 16:42:02 +01:00
Mike Fährmann
107798eeab [subscribestar] strip whitespace from 'content' 2025-01-04 16:19:22 +01:00
Mike Fährmann
a53ce6103c [deviantart:tiptap] smaller fixes
- fix text indentation in headings
- fix deviations formats without 'c' path
- support custom 'target' in links
2025-01-03 22:48:06 +01:00
Mike Fährmann
1dcb40be7c merge #6760: [boosty] support 'file' post attachments (#2387)
https://github.com/mikf/gallery-dl/issues/2387#issuecomment-2564671646
2025-01-03 15:59:03 +01:00
Mike Fährmann
bce9be66c2 merge #6761: [subscribestar] improve 'content' metadata extraction 2025-01-03 15:56:17 +01:00
Wyoh Knott
22d4e84372 [subscribestar] Better extraction of content
The structure of content is like this:

```
<div class="post-content" data-role="post_content-text">
                <div class="trix-content">
                    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
                    <html>
                        <body>
                            <div>
                                Unspeakable thing are written here<br />
                                <br />
                                haiiiiiiiiiiiiiiii hi hi hiii its meee back againnn, plspls leave a comment if uuuu liked it mwah
                                &lt;3
                            </div>
                        </body>
                    </html>
                </div>
            </div>
            <div class="post-uploads
```

Currently we extract content with:

```
(extr('<div class="post-content', '<div class="post-uploads').partition(">")[2])
```

I propose we just take the body parts:

```
extr('<body>', '</body>')
```

which only happen when surrounding actual content.

It is then easier to use it in the filename content with the `!H`
formatter: `content[:160]!H}`. Otherwise the content currently extracted
can't be decoded with it.
2025-01-03 14:57:12 +01:00
Dominik
ea6594734d [boosty] Fixed formatting 2025-01-03 08:27:11 +01:00
Dominik
8c9221f0a6 [boosty] Added post attachment download 2025-01-03 08:18:57 +01:00
Mike Fährmann
5767c0854c merge #6758: [subscribestar] fix attachment downloads and add support for audio type
(#6721, #>6724)
2025-01-02 18:25:37 +01:00
Mike Fährmann
671297a8cc [subscribestar] extend fix + add test
some attachments are inside an element with an additional class besides
'doc_preview', e.g. 'class="doc_preview for_post"'
2025-01-02 18:22:15 +01:00
Mike Fährmann
2dd2c71c53 [docs] update configuration.rst 2025-01-02 17:54:47 +01:00
Mike Fährmann
428eb53086 [hitomi] provide 'search_tags' metadata for search/tag results
(#1015, #6756)
2025-01-02 17:49:30 +01:00
Mike Fährmann
0c584f9be7 [sankaku] support alphanumeric book/pool IDs (#6757) 2025-01-02 15:49:07 +01:00
Wyoh Knott
a46f7981ee [subscribestar] Fix attachment download and add support for audio type
- We change the text.extr 3rd argument to match current structure
   ('class="post-edit_form"')
 - We add support for uploads-audios based on a similar structure as the
   attachment type:
    - id = data-upload-id
    - name = audio_preview-title
    - url = src
    - type = audio

Fix #6721
2025-01-02 15:47:09 +01:00
Mike Fährmann
bd7320fb7d [deviantart:tiptap] support more content block types
- anchor
- blockquote
- da-gif
- da-video
- lists
    - listItem
    - orderedList
    - bulletList
- text indentation
2025-01-02 14:17:32 +01:00
Mike Fährmann
5c5b6d6276 [deviantart:tiptap] fix deviation embeds without 'token' 2024-12-28 19:47:05 +01:00
Mike Fährmann
7391dd208c [poipiku] always query 'ShowAppendFileF' when post has warning (#6736) 2024-12-27 20:32:50 +01:00
Mike Fährmann
bc7e95684d [piczel] fix extraction (#6735)
- fix pagination
- update API endpoints
- provide 'count' metadata field
- use BASE_PATTERN and self.groups[…]
2024-12-27 15:08:08 +01:00
Mike Fährmann
167a726972 [szurubooru] support 'visuabusters.com/booru' (#6729) 2024-12-26 19:04:16 +01:00
Mike Fährmann
998f949db1 [civitai] add 'user-videos' extractor (#6644) 2024-12-26 10:18:54 +01:00
Mike Fährmann
c6d5e25055 [workflows:executables] use Python 3.13 2024-12-25 19:50:26 +01:00
Mike Fährmann
99de0e1867 [instagram] fix 'pinned' values for '/reels' results (#6719) 2024-12-25 19:42:50 +01:00
Mike Fährmann
3024dce06b [8muses] skip albums without valid 'permalink' (#6717) 2024-12-24 13:49:19 +01:00