Commit Graph

24 Commits

Author SHA1 Message Date
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
5503ac4d5e replace json.dumps with direct calls to JSONEncoder.encode 2023-02-09 15:51:40 +01:00
Mike Fährmann
c6a9bab019 update extractor test results 2022-07-12 15:49:22 +02:00
Mike Fährmann
49a50fb2eb [500px] create directories per photo 2021-12-25 17:16:45 +01:00
Mike Fährmann
89bebe1bef [500px] add 'favorite' extractor (closes #1927) 2021-12-25 17:16:45 +01:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
21c2da454f update extractor test results 2021-07-04 22:00:32 +02:00
Mike Fährmann
0d2961ae81 [500px] remove last query hash entry
forgot to include this in b56e2450
2021-06-16 23:00:45 +02:00
Mike Fährmann
b56e245094 [500px] update GraphQL queries
500px changed its method from query hashes to sending the entire query
string for every request.
2021-06-14 16:13:08 +02:00
Mike Fährmann
532ac79fb0 update extractor test results 2021-05-21 02:28:53 +02:00
Mike Fährmann
d7bc4a2b8b [500px] update query hashes 2021-05-21 01:20:31 +02:00
Mike Fährmann
b3ee10a7fb [500px] update query hashes 2021-05-06 17:28:26 +02:00
Mike Fährmann
82c32d25af [500px] update query hashes 2021-04-15 17:28:31 +02:00
Mike Fährmann
9785c551bc [500px] skip unavailable photos (#1335)
instead of crashing with a KeyError exception
2021-03-04 20:26:26 +01:00
Mike Fährmann
e88d5bede8 [500px] update query hash 2021-02-08 22:40:02 +01:00
Mike Fährmann
a46561bc16 [500px] update query hashes 2020-11-13 06:36:11 +01:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
93e04bf9a9 [500px] update query hashes 2020-10-03 19:25:28 +02:00
Mike Fährmann
cc1fb0b4ea [500px] update query hash 2020-09-16 01:26:31 +02:00
Mike Fährmann
84e04cc23b [500px] fix extraction and update URL patterns (fixes #956)
- rewrite most API calls to GraphQL queries
- match '500px.com/p/<user>' URLs
2020-08-24 18:25:31 +02:00
Mike Fährmann
38b6bd66b0 [500px] match 'web.500px.com' subdomains 2020-04-26 22:17:20 +02:00
Mike Fährmann
a3c736fedc [500px] fix extraction
Maximum available image dimensions have been reduced to 4096px
on the longest edge. (from 5000px)
A few (unimportant) metadata fields are no longer available or have
been changed to 'null'.
2019-07-19 17:23:03 +02:00
Mike Fährmann
8d96a8ce4c [500px] add user-, gallery-, and image-extractors (#185) 2019-03-20 17:32:36 +01:00