32 Commits

Author SHA1 Message Date
Mike Fährmann
40a4ff935a [500px] export GraphQL queries 2026-02-01 19:16:14 +01:00
Mike Fährmann
e006d26c8e Revert "use f-strings when building 'pattern'"
revert d7c97d5a97.
2025-12-20 22:07:37 +01:00
Mike Fährmann
968597a302 yield 3-tuples for Message.Directory
adapt tuples to the same length and semantics as other messages
2025-12-05 21:39:52 +01:00
Mike Fährmann
d7c97d5a97 use f-strings when building 'pattern' 2025-10-20 21:23:11 +02:00
Mike Fährmann
f2a72d8d1e replace 'request(…).json()' with 'request_json(…)' 2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
3ecb512722 send Referer headers by default 2023-09-19 00:02:04 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
5503ac4d5e replace json.dumps with direct calls to JSONEncoder.encode 2023-02-09 15:51:40 +01:00
Mike Fährmann
c6a9bab019 update extractor test results 2022-07-12 15:49:22 +02:00
Mike Fährmann
49a50fb2eb [500px] create directories per photo 2021-12-25 17:16:45 +01:00
Mike Fährmann
89bebe1bef [500px] add 'favorite' extractor (closes #1927) 2021-12-25 17:16:45 +01:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
21c2da454f update extractor test results 2021-07-04 22:00:32 +02:00
Mike Fährmann
0d2961ae81 [500px] remove last query hash entry
forgot to include this in b56e2450
2021-06-16 23:00:45 +02:00
Mike Fährmann
b56e245094 [500px] update GraphQL queries
500px changed its method from query hashes to sending the entire query
string for every request.
2021-06-14 16:13:08 +02:00
Mike Fährmann
532ac79fb0 update extractor test results 2021-05-21 02:28:53 +02:00
Mike Fährmann
d7bc4a2b8b [500px] update query hashes 2021-05-21 01:20:31 +02:00
Mike Fährmann
b3ee10a7fb [500px] update query hashes 2021-05-06 17:28:26 +02:00
Mike Fährmann
82c32d25af [500px] update query hashes 2021-04-15 17:28:31 +02:00
Mike Fährmann
9785c551bc [500px] skip unavailable photos (#1335)
instead of crashing with a KeyError exception
2021-03-04 20:26:26 +01:00
Mike Fährmann
e88d5bede8 [500px] update query hash 2021-02-08 22:40:02 +01:00
Mike Fährmann
a46561bc16 [500px] update query hashes 2020-11-13 06:36:11 +01:00
Mike Fährmann
968d3e8465 remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
93e04bf9a9 [500px] update query hashes 2020-10-03 19:25:28 +02:00
Mike Fährmann
cc1fb0b4ea [500px] update query hash 2020-09-16 01:26:31 +02:00
Mike Fährmann
84e04cc23b [500px] fix extraction and update URL patterns (fixes #956)
- rewrite most API calls to GraphQL queries
- match '500px.com/p/<user>' URLs
2020-08-24 18:25:31 +02:00
Mike Fährmann
38b6bd66b0 [500px] match 'web.500px.com' subdomains 2020-04-26 22:17:20 +02:00
Mike Fährmann
a3c736fedc [500px] fix extraction
Maximum available image dimensions have been reduced to 4096px
on the longest edge. (from 5000px)
A few (unimportant) metadata fields are no longer available or have
been changed to 'null'.
2019-07-19 17:23:03 +02:00
Mike Fährmann
8d96a8ce4c [500px] add user-, gallery-, and image-extractors (#185) 2019-03-20 17:32:36 +01:00