Files
gallery-dl/gallery_dl
Wyoh Knott 22d4e84372 [subscribestar] Better extraction of content
The structure of content is like this:

```
<div class="post-content" data-role="post_content-text">
                <div class="trix-content">
                    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
                    <html>
                        <body>
                            <div>
                                Unspeakable thing are written here<br />
                                <br />
                                haiiiiiiiiiiiiiiii hi hi hiii its meee back againnn, plspls leave a comment if uuuu liked it mwah
                                &lt;3
                            </div>
                        </body>
                    </html>
                </div>
            </div>
            <div class="post-uploads
```

Currently we extract content with:

```
(extr('<div class="post-content', '<div class="post-uploads').partition(">")[2])
```

I propose we just take the body parts:

```
extr('<body>', '</body>')
```

which only happen when surrounding actual content.

It is then easier to use it in the filename content with the `!H`
formatter: `content[:160]!H}`. Otherwise the content currently extracted
can't be decoded with it.
2025-01-03 14:57:12 +01:00
..
2024-11-03 21:25:45 +01:00
2021-10-13 04:07:41 +02:00
2023-12-18 23:57:22 +01:00
2024-10-01 20:28:30 +02:00