Commit Graph

6231 Commits

Author SHA1 Message Date
Mike Fährmann
cb1a75eefc [twitter] handle errors during file extraction (#6647) 2025-01-21 18:23:54 +01:00
Mike Fährmann
d9c4fcc7fa [twitter] generate longer CSRF token values 2025-01-21 18:19:25 +01:00
Mike Fährmann
105c027411 [path] handle exception when using --rename-to --no-download (#6861)
Catch a possible FileExistsError exception when attempting to create a
new directory during handling of a FileNotFoundError exception.
FileNotFoundError may also occur when the file at self.temppath is
missing because it hasn't been downloaded due to --no-download.
2025-01-20 20:50:31 +01:00
Mike Fährmann
05fa6dd354 [nekohouse] add initial support (#5241, #6738) 2025-01-20 20:15:34 +01:00
Mike Fährmann
6ce310d865 [weebcentral] fix extraction (#6860) 2025-01-19 18:14:03 +01:00
Mike Fährmann
f867e690c1 merge #6855: [turboimagehost] add support for galleries 2025-01-19 17:51:48 +01:00
Mike Fährmann
0f50dd17ba merge #6606: [docs] add nix docs to README 2025-01-19 17:50:05 +01:00
arebokert
556fbb1a44 [turboimagehost] add support for galleries
- added support
- raise error if gallery not found
- fix test
- fix lint issues
- simplify
2025-01-19 17:28:45 +01:00
DontEatOreo
b15283cf6d README.rst: add nix docs 2025-01-19 17:46:57 +02:00
Mike Fährmann
bb2f9b8443 [release] include 'scripts/run_tests.py' in release tarball (#6856) 2025-01-19 15:58:23 +01:00
Mike Fährmann
438c61601b [xfolio] add initial support (#5514, #6351, #6837) 2025-01-18 15:57:56 +01:00
Mike Fährmann
dc7b46be21 [khinsider] add 'covers' option (#6844) 2025-01-18 15:57:56 +01:00
Mike Fährmann
5a31a2ad22 [khinsider] extract more 'album' metadata (#6844)
- year
- catalog
- developer
- publisher
- uploader
2025-01-18 15:57:55 +01:00
Mike Fährmann
3849b3fa92 [batoto] use 'chapter_id' in default archive IDs (#6835)
instead of '{chapter}{chapter_minor}' since some chapters have no actual
chapter number and end up as '0', potentially causing ID overlap
2025-01-15 14:52:18 +01:00
Mike Fährmann
6e919a3695 [e621] support e621.cc and e621.anthro.fr frontend URLs (#6809) 2025-01-15 14:35:37 +01:00
Mike Fährmann
843a39a6c6 [bunkr] extract correct 'filename' data (#6824) 2025-01-14 19:45:48 +01:00
Mike Fährmann
d17a423245 [xhamster] fix 'gallery' extractor (#6818) 2025-01-13 18:58:08 +01:00
Mike Fährmann
bde99cc6ce [cohost] remove module
cohost.org  now redirects to archive.org
2025-01-13 14:38:35 +01:00
Mike Fährmann
42070240ae [tests] allow testing for types + values 2025-01-12 20:55:37 +01:00
Mike Fährmann
2b46b82f9c [release] prevent overwriting ${CHANGELOG}.orig with truncated file
to avoid deleting most of CHANGELOG.md by accident when the release.sh
script gets interrupted halfway through, as happened during the v1.28.3
release in commit 7e8ca377fc
2025-01-12 18:05:35 +01:00
Mike Fährmann
6e3f51a05e release version 1.28.4 2025-01-12 17:22:09 +01:00
Mike Fährmann
91bd3e37f2 [pexels] add support (#2286, #4214, #6769) 2025-01-12 16:50:12 +01:00
Mike Fährmann
1ae3ac5e39 [common] add '_extract_nextdata' method 2025-01-12 11:48:36 +01:00
Mike Fährmann
3f48e2f820 [common] add '_extract_jsonld' method (#5272) 2025-01-12 11:07:48 +01:00
Mike Fährmann
88f1ef7c3c [bunkr] fix metadata extraction (#6805) 2025-01-11 12:48:41 +01:00
Mike Fährmann
1d75c8308c [weebcentral] add support (#6778) 2025-01-10 23:04:51 +01:00
Mike Fährmann
4853406fe3 [common] allow MangaExtractors to skip loading manga_url 2025-01-10 21:30:58 +01:00
Mike Fährmann
af9c06f812 [bunkr] fix album extraction (#6798) 2025-01-10 13:01:04 +01:00
Mike Fährmann
118b994cf2 [bunkr] support '/f/...' media URLs 2025-01-10 13:01:04 +01:00
Mike Fährmann
ba0443115a [bunkr] fix ValueError on relative redirects (#6790) 2025-01-10 13:00:52 +01:00
Mike Fährmann
89276c5b3e [e621] match 'tag' search URLs with empty tag (#6783) 2025-01-07 20:00:26 +01:00
Mike Fährmann
d18f311fe2 [plurk] fix 'user' data extraction and make it non-fatal (#6742) 2025-01-06 20:27:37 +01:00
Mike Fährmann
b1ffb62644 [docs] update 'sleep-request' value for 'wallhaven' 2025-01-06 17:24:04 +01:00
Mike Fährmann
46b6b71159 [wallhaven] extract 'search[tags]' and 'search[tag_id]' metadata
(#6772)
2025-01-06 17:18:04 +01:00
Mike Fährmann
270aaea8ab [pixiv] provide fallback URLs (#6762) 2025-01-06 15:27:32 +01:00
Mike Fährmann
770f41eb4a [util] support not splitting "contains" value (#6773)
by passing any "false" value as 'separator' argument except None
2025-01-06 13:47:32 +01:00
Mike Fährmann
a3b9cc7785 [options] mark '--list-extractors' argument as optional 2025-01-05 21:37:44 +01:00
Mike Fährmann
7e8ca377fc release version 1.28.3 2025-01-04 16:42:02 +01:00
Mike Fährmann
107798eeab [subscribestar] strip whitespace from 'content' 2025-01-04 16:19:22 +01:00
Mike Fährmann
a53ce6103c [deviantart:tiptap] smaller fixes
- fix text indentation in headings
- fix deviations formats without 'c' path
- support custom 'target' in links
2025-01-03 22:48:06 +01:00
Mike Fährmann
1dcb40be7c merge #6760: [boosty] support 'file' post attachments (#2387)
https://github.com/mikf/gallery-dl/issues/2387#issuecomment-2564671646
2025-01-03 15:59:03 +01:00
Mike Fährmann
bce9be66c2 merge #6761: [subscribestar] improve 'content' metadata extraction 2025-01-03 15:56:17 +01:00
Wyoh Knott
22d4e84372 [subscribestar] Better extraction of content
The structure of content is like this:

```
<div class="post-content" data-role="post_content-text">
                <div class="trix-content">
                    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
                    <html>
                        <body>
                            <div>
                                Unspeakable thing are written here<br />
                                <br />
                                haiiiiiiiiiiiiiiii hi hi hiii its meee back againnn, plspls leave a comment if uuuu liked it mwah
                                &lt;3
                            </div>
                        </body>
                    </html>
                </div>
            </div>
            <div class="post-uploads
```

Currently we extract content with:

```
(extr('<div class="post-content', '<div class="post-uploads').partition(">")[2])
```

I propose we just take the body parts:

```
extr('<body>', '</body>')
```

which only happen when surrounding actual content.

It is then easier to use it in the filename content with the `!H`
formatter: `content[:160]!H}`. Otherwise the content currently extracted
can't be decoded with it.
2025-01-03 14:57:12 +01:00
Dominik
ea6594734d [boosty] Fixed formatting 2025-01-03 08:27:11 +01:00
Dominik
8c9221f0a6 [boosty] Added post attachment download 2025-01-03 08:18:57 +01:00
Mike Fährmann
5767c0854c merge #6758: [subscribestar] fix attachment downloads and add support for audio type
(#6721, #>6724)
2025-01-02 18:25:37 +01:00
Mike Fährmann
671297a8cc [subscribestar] extend fix + add test
some attachments are inside an element with an additional class besides
'doc_preview', e.g. 'class="doc_preview for_post"'
2025-01-02 18:22:15 +01:00
Mike Fährmann
2dd2c71c53 [docs] update configuration.rst 2025-01-02 17:54:47 +01:00
Mike Fährmann
428eb53086 [hitomi] provide 'search_tags' metadata for search/tag results
(#1015, #6756)
2025-01-02 17:49:30 +01:00
Mike Fährmann
0c584f9be7 [sankaku] support alphanumeric book/pool IDs (#6757) 2025-01-02 15:49:07 +01:00