[arcalive] add support (#5657 #7100)

* [arca.live] Add extractor skeleton

* [arcalive] update names and formatting

* [arcalive] implement initial file extraction code

* [arcalive] improve '_extract_media()' performance

compile and cache regex on demand

* [arcalive] improve image extraction

- extract 'data-originalurl' URLs if available
- replace URL query strings with 'type=orig'
- ignore emoticons by default

* [arcalive] update defaults

- include 'title' in filenames
- use 0.5-1.5s delay between requests

* [arcalive] use ext from 'data-orig' if available

* [arcalive] update docs/supportedsites

* [arcalive] add tests

* [arcalive] update 'board' extractor pattern

so it doesn't also match 'post' URLs

---------

Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
This commit is contained in:
hdk5
2025-03-14 11:52:21 +02:00
committed by GitHub
parent 22d46f2462
commit d900e868e4
6 changed files with 311 additions and 0 deletions

View File

@@ -384,6 +384,7 @@ Type
Default
* ``"0.5-1.5"``
``ao3``,
``arcalive``,
``civitai``,
``[Danbooru]``,
``[E621]``,
@@ -1394,6 +1395,16 @@ Description
Format(s) to download.
extractor.arcalive.emoticons
----------------------------
Type
``bool``
Default
``false``
Description
Download emoticon images.
extractor.artstation.external
-----------------------------
Type

View File

@@ -99,6 +99,12 @@
"formats": ["pdf"]
},
"arcalive":
{
"sleep-request": "0.5-1.5",
"emoticons": false
},
"artstation":
{
"external" : false,

View File

@@ -97,6 +97,12 @@ Consider all listed sites to potentially be NSFW.
<td>Posts, Tag Searches</td>
<td></td>
</tr>
<tr>
<td>Arcalive</td>
<td>https://arca.live/</td>
<td>Boards, Posts</td>
<td></td>
</tr>
<tr>
<td>Architizer</td>
<td>https://architizer.com/</td>