Commit Graph

92 Commits

Author SHA1 Message Date
Mike Fährmann
a0b3e08f64 [tests/extractor] ensure Extractor classes match 2025-09-17 19:29:49 +02:00
Mike Fährmann
a097a373a9 simplify if statements by using walrus operators (#7671) 2025-07-22 20:57:54 +02:00
Mike Fährmann
2ccb9acf1a [pinterest] support 'pin.it' board redirects (#7805) 2025-07-11 22:28:26 +02:00
Mike Fährmann
8e40ea2fe2 [pinterest] match board URLs with query strings (#7805) 2025-07-11 22:28:26 +02:00
Mike Fährmann
d8ef1d693f rename 'StopExtraction' to 'AbortExtraction'
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a 'match.group(N)' -> 'match[N]' (#7671)
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
9d3cf67f3e [pinterest] remove excess whitespace from 'description' fields (#4335)
and 'closeup_unified_description' & 'closeup_description'
2025-06-13 13:11:18 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
f5b8c25559 [pinterest] ignore 'story_pin_product_sticker_block' blocks (#7563) 2025-05-22 18:42:39 +02:00
Mike Fährmann
88f1541a83 [common] add 'request_location()' convenience function 2025-04-19 16:45:05 +02:00
Mike Fährmann
c4d08b24e9 [pinterest] ignore 'story_pin_static_sticker_block' blocks (#7251) 2025-03-28 20:20:29 +01:00
Mike Fährmann
b8b943fc38 [pinterest] update API headers (#6513)
'BoardFeed' requests fail without 'X-Pinterest-PWS-Handler'
2024-11-22 08:41:10 +01:00
Mike Fährmann
ce90566c56 [pinterest] detect video/audio by block content (#6421)
story blocks from search/board results do not always contain a 'type'
2024-11-05 15:55:24 +01:00
Mike Fährmann
a9a9f3a180 [pinterest] support 'story_pin_music_block' blocks (#6421) 2024-11-05 15:55:24 +01:00
Mike Fährmann
5d984f35aa [pinterest] support 'story' pins (#6188, #6078, #4229) 2024-10-19 17:47:31 +02:00
Mike Fährmann
5477ed181d [pinterest] move file extraction into separate method 2024-10-18 20:55:20 +02:00
Mike Fährmann
1824267447 [dl:ytdl] implement explicit HLS/DASH handling
add '_ytdl_manifest' to specify a manifest type to process
2024-10-16 15:16:21 +02:00
Mike Fährmann
d7823b9f81 [pinterest] fix section URLs for boards with /?# in name (#5104) 2024-02-05 15:54:06 +01:00
blankie
375f2db4c2 [pinterest] add count metadata field 2023-12-28 01:07:04 +11:00
Mike Fährmann
75fa1a5553 [pinterest] remove login code
this has been broken since forever
and is still "protected" by an invisible recaptcha check
2023-12-20 20:59:18 +01:00
Mike Fährmann
57fc6fcf83 replace '24*3600' with '86400'
and generalize cache maxage values
2023-12-18 23:57:22 +01:00
Mike Fährmann
3ecb512722 send Referer headers by default 2023-09-19 00:02:04 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
850df34c31 remove '&' from URL patterns part 2
follow-up on 968d3e8465
2023-05-03 20:26:25 +02:00
Mike Fährmann
4d415376d1 [pinterest] fix 'pin.it' extractor
it really was just the single '/' at the end of the url_shortener URL
2023-05-03 20:05:10 +02:00
Mike Fährmann
657b6a9100 [pinterest] update endpoint for related board pins 2023-05-03 18:41:09 +02:00
Mike Fährmann
0b93420a81 [pinterest] unescape search terms (#3621) 2023-02-15 15:44:20 +01:00
Mike Fährmann
5503ac4d5e replace json.dumps with direct calls to JSONEncoder.encode 2023-02-09 15:51:40 +01:00
Mike Fährmann
9116398c1c [pinterest] add 'domain' option (#3484)
use input URL domain by default
2023-01-04 17:20:14 +01:00
Mike Fährmann
294108c90a [pinterest] support 'All Pins' boards (#2855, #3484) 2023-01-03 19:11:20 +01:00
Mike Fährmann
311e9383af [pinterest] handle section pins with separate extractors (#2684) 2022-07-03 18:12:16 +02:00
Mike Fährmann
0b33435da5 [pinterest] support multiple files per pin (closes #1619, #2452) 2022-04-06 21:21:33 +02:00
Mike Fährmann
9c5d2d7af3 [pinterest] add extractor for created pins (#2452) 2022-04-01 16:59:58 +02:00
Mike Fährmann
9313d4dc10 [pinterest] do not force 'm3u8_native' for video downloads (#2436) 2022-03-21 10:11:51 +01:00
Mike Fährmann
36291176bc [pinterest] add 'search' extractor (#1411) 2021-03-29 01:41:28 +02:00
Mike Fährmann
780b6adb91 rename 'generate_csrf_token()' to just 'generate_token()'
and add a 'size' argument
2021-01-11 22:12:40 +01:00
Mike Fährmann
8a88025dc4 [pinterest] support generic user URLs (#1205)
i.e. https://www.pinterest.com/USERNAME

also renames 'BoardsExtractor' to 'UserExtractor'
2021-01-02 02:36:53 +01:00
Mike Fährmann
6cdbab07b5 [pinterest] add support for getting all boards of a user
(#1205)
2020-12-29 16:57:03 +01:00
Mike Fährmann
371e9ca6df [pinterest] implement video support (closes #1189) 2020-12-21 16:09:06 +01:00
Mike Fährmann
b8daabc3ca [pinterest] implement login support (closes #1055)
being logged allows access to secret/protected boards
2020-10-15 15:14:18 +02:00
Mike Fährmann
26a967cbd4 [pinterest] match 'pinterest.co.uk' URLs (fixes #914) 2020-07-27 14:41:34 +02:00
Mike Fährmann
0e714b9a0e [pinterest] add 'section' extractor (#835) 2020-06-21 00:08:14 +02:00
Mike Fährmann
5ba90f72ca [pinterest] add support for sections (closes #835) 2020-06-16 14:41:05 +02:00
Mike Fährmann
32d7195d08 [pinterest] improve detection of invalid pin.it links 2020-01-18 21:06:44 +01:00
Mike Fährmann
1f2a69f3c5 add '_extractor' information to redirect results 2019-12-29 23:37:34 +01:00
Mike Fährmann
c4702ec9b6 simplify some logging calls 2019-12-10 21:30:08 +01:00
Mike Fährmann
da6789b2b0 disable unique archive id checks for some tests
- same image twice in a livedoor blog post
- unreliable results for related pinterest items
2019-11-10 17:04:51 +01:00
Mike Fährmann
4409d00141 embed error messages in StopExtraction exceptions 2019-10-28 16:39:49 +01:00