Commit Graph

39 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
501d9bccfe [artstation] add 'max-posts' option (#3270) 2022-11-23 22:00:18 +01:00
Mike Fährmann
b1ad6f2289 [artstation] add 'pro-first' option (#3273) 2022-11-23 21:45:20 +01:00
Mike Fährmann
b0cb4a1b9c replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
220a04a74a [artstation] skip missing projects (#3016) 2022-10-06 12:04:39 +02:00
Mike Fährmann
6992d01e19 [artstation] support search filters (#2970) 2022-09-28 16:51:17 +02:00
Mike Fährmann
aafea0c4f8 [artstation] fix searches (#2970) 2022-09-27 14:25:55 +02:00
blankie
59b16b3f70 [artstation] add 'num' and 'count' metadata fields (#2764) 2022-07-19 14:25:07 +02:00
Mike Fährmann
c6a9bab019 update extractor test results 2022-07-12 15:49:22 +02:00
Mike Fährmann
1bc77efa02 [artstation] use "browser": "firefox" by default (#2527) 2022-05-02 09:03:13 +02:00
Mike Fährmann
f3d61de18d [artstation] create directories per asset (closes #2136) 2021-12-25 17:16:45 +01:00
Mike Fährmann
0e33746fe0 [artstation] use '/album/all' view for user portfolios (#1826) 2021-09-08 21:46:58 +02:00
Mike Fährmann
52a7913abe [artstation] download /4k/ images (#1422) 2021-04-07 21:50:16 +02:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
d594977ca1 [artstation] add 'following' extractor (closes #888) 2020-07-12 23:03:05 +02:00
Mike Fährmann
0371fd54a1 [artstation] add 'date' metadata field (#839) 2020-06-17 20:22:18 +02:00
Mike Fährmann
90491ab606 [artstation] improve embed extraction (#720) 2020-04-30 21:25:03 +02:00
Mike Fährmann
1e2713b895 [artstation] fix search result pagination (closes #537) 2019-12-25 17:26:37 +01:00
Mike Fährmann
23251356cb require 'extension' data for each URL (#382) 2019-08-14 20:03:03 +02:00
Mike Fährmann
fdec59f8e2 replace extractor.request() 'expect' argument
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
2019-07-05 00:42:16 +02:00
Mike Fährmann
6da3e21237 [downloader:ytdl] provide 'filename' metadata (closes #291) 2019-05-31 14:56:45 +02:00
Mike Fährmann
22d3a2fcc8 [artstation] add extractor for artwork listings (#80)
like https://www.artstation.com/artwork?sorting=latest
or   https://www.artstation.com/artwork?sorting=picks
2019-02-18 12:45:44 +01:00
Mike Fährmann
5530871b5a change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
4b1880fa5e propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
8fc6fbfa34 [artstation] recognize shortened project URLs
https://artstn.co/p/<project-id>
2019-02-09 16:53:11 +01:00
Mike Fährmann
6284731107 simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
34bab080ae rewrite URL patterns to use only 1 per extractor 2019-02-08 12:03:10 +01:00
Mike Fährmann
89df37a173 [artstation] use a separate dict for each asset (#154)
Using the same base-dict for each asset of a project causes unwanted
side effects like re-using image filename extensions for videos,
resulting in errors with the youtube-dl downloader.
2019-01-11 12:26:12 +01:00
Mike Fährmann
7f6a0be982 adjust some tests 2018-11-15 22:50:04 +01:00
Mike Fährmann
36425122ff [artstation] handle external URLs with youtube-dl 2018-11-13 14:27:02 +01:00
Mike Fährmann
017188d268 improve extractor.request()
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
2018-06-18 16:29:56 +02:00
Mike Fährmann
2d17a9e07f improve extractor.request()
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
2018-04-23 18:45:59 +02:00
Mike Fährmann
f471161920 Merge branch 'master' into 1.4-dev 2018-04-21 12:15:40 +02:00
Mike Fährmann
cc36f88586 rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
b1325d4d2c fix extractor docstrings 2018-04-18 18:03:43 +02:00
Mike Fährmann
e7525b1b0e [artstation] add challenge extractor (#80) 2018-03-23 15:06:09 +01:00
Mike Fährmann
44c267e362 [artstation] add search extractor (#80) 2018-03-17 19:04:37 +01:00
Mike Fährmann
40ca562d7b [artstation] add album extractor (#80) 2018-03-17 17:36:31 +01:00
Mike Fährmann
723cc66bb1 [artstation] add user-, image- and likes-extractors 2018-03-14 14:05:14 +01:00