63 Commits

Author SHA1 Message Date
Mike Fährmann
53cdfaac37 [common] add reference to 'exception' module to Extractor class
- remove 'exception' imports
- replace with 'self.exc'
2026-02-15 10:57:22 +01:00
Mike Fährmann
d3adfd603b [artstation] fix & update 'challenge' extractor 2026-02-05 22:37:10 +01:00
Mike Fährmann
04442e262e [artstation] download '/8k/' images (#9003) 2026-02-05 17:32:55 +01:00
Mike Fährmann
b37acd1e28 [artstation] fix embedded videos (#8972) 2026-02-01 13:00:48 +01:00
Mike Fährmann
366b0750a8 [common] use extractor subcategory for 'notfound=True' 2026-01-19 11:19:35 +01:00
Mike Fährmann
00c6821a3f replace 2-element f-strings with simple '+' concatenations
Python's 'ast' module and its 'NodeVisitor' class
were incredibly helpful in identifying these
2025-12-22 11:26:04 +01:00
Mike Fährmann
968597a302 yield 3-tuples for Message.Directory
adapt tuples to the same length and semantics as other messages
2025-12-05 21:39:52 +01:00
Mike Fährmann
c38856bd3f [dt] use 'parse_datetime_iso()' for ISO formats 2025-10-19 21:52:05 +02:00
Mike Fährmann
085616e0a8 [dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()' 2025-10-17 17:43:06 +02:00
Mike Fährmann
fc968ebf20 [artstation] support downloading '.mview' files (#7812) 2025-07-12 20:53:16 +02:00
Mike Fährmann
f2a72d8d1e replace 'request(…).json()' with 'request_json(…)' 2025-06-29 17:50:19 +02:00
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
811b665e33 remove @staticmethod decorators
There might have been a time when calling a static method was faster
than a regular method, but that is no longer the case. According to
micro-benchmarks, it is 70% slower in CPython 3.13 and it also makes
executing the code of a class definition slower.
2025-06-12 22:50:52 +02:00
Mike Fährmann
4874c8e1d1 [artstation] restore 'browser' and 'tls12' defaults
partially revert 954796a466
2025-01-28 11:36:06 +01:00
Mike Fährmann
98c068a379 [artstation] simplify '_no_cache()' 2025-01-26 16:12:25 +01:00
Mike Fährmann
954796a466 [artstation] prevent CF challenges (#5817, #5658, #5564, #5554) 2025-01-26 16:00:16 +01:00
Mike Fährmann
a53db09ca0 [artstation] disable TLS 1.2 ciphers by default (#5564, #5658) 2024-05-30 23:54:19 +02:00
Mike Fährmann
1a9b9aa310 [artstation] support video clips (#2566, #3309, #3911)
- add 'videos' and 'previews' options
- fix 403 errors for video previews
2024-03-03 18:00:45 +01:00
Mike Fährmann
cf9e99c07b [artstation] support collections (#146)
https://github.com/mikf/gallery-dl/issues/146#issuecomment-1972101003
2024-03-01 20:21:21 +01:00
blankie
962f55cc68 [artstation] fix handling usernames with dashes 2024-02-21 17:39:37 +11:00
Mike Fährmann
3ecb512722 send Referer headers by default 2023-09-19 00:02:04 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
501d9bccfe [artstation] add 'max-posts' option (#3270) 2022-11-23 22:00:18 +01:00
Mike Fährmann
b1ad6f2289 [artstation] add 'pro-first' option (#3273) 2022-11-23 21:45:20 +01:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
220a04a74a [artstation] skip missing projects (#3016) 2022-10-06 12:04:39 +02:00
Mike Fährmann
6992d01e19 [artstation] support search filters (#2970) 2022-09-28 16:51:17 +02:00
Mike Fährmann
aafea0c4f8 [artstation] fix searches (#2970) 2022-09-27 14:25:55 +02:00
blankie
59b16b3f70 [artstation] add 'num' and 'count' metadata fields (#2764) 2022-07-19 14:25:07 +02:00
Mike Fährmann
c6a9bab019 update extractor test results 2022-07-12 15:49:22 +02:00
Mike Fährmann
1bc77efa02 [artstation] use "browser": "firefox" by default (#2527) 2022-05-02 09:03:13 +02:00
Mike Fährmann
f3d61de18d [artstation] create directories per asset (closes #2136) 2021-12-25 17:16:45 +01:00
Mike Fährmann
0e33746fe0 [artstation] use '/album/all' view for user portfolios (#1826) 2021-09-08 21:46:58 +02:00
Mike Fährmann
52a7913abe [artstation] download /4k/ images (#1422) 2021-04-07 21:50:16 +02:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
d594977ca1 [artstation] add 'following' extractor (closes #888) 2020-07-12 23:03:05 +02:00
Mike Fährmann
0371fd54a1 [artstation] add 'date' metadata field (#839) 2020-06-17 20:22:18 +02:00
Mike Fährmann
90491ab606 [artstation] improve embed extraction (#720) 2020-04-30 21:25:03 +02:00
Mike Fährmann
1e2713b895 [artstation] fix search result pagination (closes #537) 2019-12-25 17:26:37 +01:00
Mike Fährmann
23251356cb require 'extension' data for each URL (#382) 2019-08-14 20:03:03 +02:00
Mike Fährmann
fdec59f8e2 replace extractor.request() 'expect' argument
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
2019-07-05 00:42:16 +02:00
Mike Fährmann
6da3e21237 [downloader:ytdl] provide 'filename' metadata (closes #291) 2019-05-31 14:56:45 +02:00
Mike Fährmann
22d3a2fcc8 [artstation] add extractor for artwork listings (#80)
like https://www.artstation.com/artwork?sorting=latest
or   https://www.artstation.com/artwork?sorting=picks
2019-02-18 12:45:44 +01:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
8fc6fbfa34 [artstation] recognize shortened project URLs
https://artstn.co/p/<project-id>
2019-02-09 16:53:11 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00