38 Commits

Author SHA1 Message Date
Mike Fährmann
c8c4575c7f [dl:http] add MIME type and signature for .aac files 2025-12-29 19:05:34 +01:00
Mike Fährmann
efad90696d [dl:http] fail downloads of empty files (#8661) 2025-12-09 11:18:52 +01:00
Mike Fährmann
45f364e09e [dl:http] add MIME type and signature for m3u8 & mpd files (#8339) 2025-10-03 16:48:10 +02:00
Mike Fährmann
b9429de774 [tests] use f-strings (##7671) 2025-08-14 10:22:42 +02:00
Mike Fährmann
bcfce6b7db [dl:http] improve HTML signature check (#7697)
https://github.com/mikf/gallery-dl/issues/7697#issuecomment-2990734451

ignore leading whitespace
2025-06-20 14:39:32 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
8b6bc54e95 [dl:http] add MIME type and signature for .html files 2025-06-12 21:16:34 +02:00
Mike Fährmann
a25e14e776 [dl:http] implement dynamic download 'rate' limits (#7638) 2025-06-08 20:04:31 +02:00
Mike Fährmann
613f05afa3 fix cmdline arguments not overriding extractor-downloader options 2025-02-22 17:40:27 +01:00
Mike Fährmann
18ed39c1cf implement 'downloader' options per extractor category
by setting options inside 'http' or 'ytdl' inside extractor options
or inside subcategory options

{
    "extractor": {
        "mastodon": {
            "http": {
                "rate": "10k"
            }
        },
        "mastodon.social": {
            "http": {
                "rate": "100k"
            }
        }
    },
    "downloader": {
        "rate": "100m"
    }
}

Sets download speed to
-  10k for mastodon.social URLs
- 100k for mastodon sites in general
- 100m for all other sites
2025-02-22 10:08:59 +01:00
Mike Fährmann
510ca36b35 [tests] fix bug when running tests in a certain order
test_ytdl -> test_downloader -> test_extractor
would cause a test failure in Python <3.6 related to youtube_dl imports
2024-08-31 09:42:30 +02:00
Mike Fährmann
9c65db2a92 consistent 'with open(…) as fp:' syntax 2024-06-14 01:22:00 +02:00
Mike Fährmann
699592498b [tests] use random port number for local HTTP server
… and explicitly bind to 127.0.0.1 instead of all interfaces
2024-05-02 22:54:15 +02:00
Mike Fährmann
cd241bea0a [downloader:http] add MIME type and signature for .m4v files (#5505) 2024-04-25 01:01:35 +02:00
Mike Fährmann
a8027745e3 [downloader:http] add MIME type and signature for .mov files (#5287) 2024-03-06 14:00:24 +01:00
Mike Fährmann
8a11b72253 remove extractor/test.py (#4504) 2024-02-27 01:37:57 +01:00
Mike Fährmann
ea78f67860 [downloader:http] skip files not passing filesize-min/-max (#4821)
instead of failing the download
2023-11-17 22:54:20 +01:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
2edcdee32f [downloader:http] add MIME type and signature for .heic files
(#3915)
https://github.com/strukturag/libheif/issues/83
2023-04-15 17:09:22 +02:00
ClosedPort22
b6706b373a [downloader:http] add signature checks for some formats
also add the MIME type for .obj files
2023-01-15 23:40:55 +08:00
Mike Fährmann
6e08ad26f7 update downloader tests 2022-11-16 22:59:18 +01:00
Mike Fährmann
8124c16a50 split 'build_path' from 'set_filename' and 'set_extension'
Do not automatically build a new path
when setting file metadata or updating its extension.
2022-11-08 17:03:24 +01:00
Mike Fährmann
460095adca update downloader tests 2022-11-01 18:48:35 +01:00
Mike Fährmann
cad85640de move 'util.PathFormat' into its own 'path' module
to prevent circular imports between 'formatter' and 'util'
2021-09-27 21:29:37 +02:00
Mike Fährmann
8821dceb79 use __import__() to dynamically load modules 2021-03-01 01:27:02 +01:00
Mike Fährmann
ac3036ef56 add 'filesize-min' and 'filesize-max' options (closes #780) 2020-09-03 18:21:04 +02:00
Mike Fährmann
ece73b5b2a make 'path' and 'keywords' available in logging messages
Wrap all loggers used by job, extractor, downloader, and postprocessor
objects into a (custom) LoggerAdapter that provides access to the
underlying job, extractor, pathfmt, and kwdict objects and their
properties.

__init__() signatures for all downloader and postprocessor classes have
been changed to take the current Job object as their first argument,
instead of the current extractor or pathfmt.

(#574, #575)
2020-05-18 19:04:51 +02:00
Mike Fährmann
5df8f2959b insert local directory into PYTHONPATH when running tests 2020-05-02 01:15:50 +02:00
Mike Fährmann
60a43f0264 fix downloader tests 2020-01-14 11:51:06 +01:00
Mike Fährmann
f5604492c3 update interface of config functions 2019-11-24 00:42:28 +01:00
Mike Fährmann
0bb873757a update PathFormat class
- change 'has_extension' from a simple flag/bool to a field that
  contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306)
2019-08-12 21:40:37 +02:00
Mike Fährmann
ee4d7c3d89 update downloader.find() and related code
Instead of replacing 'https' with 'http' for every URL in
'get_downloader()', this now only happens once during downloader
initialization. Also unit tests.
2019-06-20 16:59:44 +02:00
Mike Fährmann
179d112083 [downloader] overhaul http and text modules
Get rid of the modular structure and simplify/specialize those modules.
2019-06-19 22:56:11 +02:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
4a348990f4 adjust value resolution for retries/timeout/verify options
This change introduces 'extractor.*.retries/timeout/verify' options
as a general way to set these values for all HTTP requests.

'downloader.http.retries/timeout/verify' is a way to override these
options for file downloads only and will fall back to 'extractor.*.…*
values if they haven't been explicitly set.

Also: downloader classes now take an extractor object as first argument
instead of a requests.session.
2018-10-07 21:13:39 +02:00
Mike Fährmann
b344f2290f fix downloader tests 2018-06-07 22:27:36 +02:00
Mike Fährmann
299ae24996 [test] add a few downloader tests 2018-03-25 15:10:25 +02:00