Commit Graph

4465 Commits

Author SHA1 Message Date
Mike Fährmann
6d850ce629 [twitter] calculate 'date' from Tweet IDs
20 times faster than parsing 'created_at'
2023-04-05 22:29:14 +02:00
Mike Fährmann
25949bd767 merge #3871: [hotleak] Fix downloading of creators whose name starts with a category name 2023-04-04 16:24:20 +02:00
Mike Fährmann
dbe06cdba1 [twitter] warn about 'withheld' Tweets and users (#3864) 2023-04-04 16:15:08 +02:00
Mike Fährmann
3cc1dd1572 [twitter] update query hashes 2023-04-03 23:20:20 +02:00
Mike Fährmann
3846ce0de5 [twitter] update to bookmark timeline v2 (#3859) 2023-04-03 22:46:12 +02:00
Mike Fährmann
34699fbf64 [deviantart:search] detect login redirects (#3860) 2023-04-03 19:37:12 +02:00
Mike Fährmann
e6cb92864a [twitter] allow setting custom features per API endpoint 2023-04-03 16:18:31 +02:00
Balgden
4b141cce66 Fix indentation 2023-04-03 13:44:14 +00:00
Balgden
bbc5977121 Fix line length 2023-04-03 13:38:42 +00:00
Balgden
ffd30abcb3 [hotleak] Fix downloading of creators whose name starts with a category name
E.g. `hot4lexi` would start downloading the `hot` section by mistake

This happened because the regex had a negative lookahead for the category names, but didn't ensure that they where followed by either end-of-string or a slash.
2023-04-03 13:30:27 +00:00
Mike Fährmann
5ca9d55595 merge #3870: [blogger] update 'sub' regex to get the highest resolution url 2023-04-03 14:47:18 +02:00
Mike Fährmann
fd7ce4c081 merge #3868: [shopify] fix 'collection' extractor 2023-04-03 14:44:46 +02:00
Mike Fährmann
135ac9c302 merge #3854: [twitter] fix: graphql_timeline_v2_bookmark_timeline cannot be null 2023-04-03 14:37:42 +02:00
enduser420
bbb1e34c34 [blogger] update sub regex 2023-04-03 12:43:58 +05:30
enduser420
96e3dd2128 [shopify] fix 'collection' extractor 2023-04-03 12:19:09 +05:30
Mike Fährmann
ac97aca99c [realbooru] fix extraction
get file URLs from HTML pages
2023-04-02 20:45:16 +02:00
Mike Fährmann
75666cf9c3 [danbooru] reduce API requests for fetching extended 'metadata'
Instead of using one additional API request per post object (N+1),
this requires only one request per 200-post batch.
2023-04-02 20:11:52 +02:00
Amer Jazaerli
bebbff6578 fix: graphql_timeline_v2_bookmark_timeline cannot be null
twitter: 400 Bad Request (The following features cannot be null: graphql_timeline_v2_bookmark_timeline)
2023-03-31 00:06:49 +02:00
Mike Fährmann
421db26aff [bunkr] update domain to 'bunkr.la' 2023-03-28 20:10:36 +02:00
Mike Fährmann
82f83c18e8 release version 1.25.1 2023-03-25 21:30:05 +01:00
Mike Fährmann
9b5e7ce8b9 [hiperdex] fix extraction 2023-03-25 18:18:27 +01:00
Mike Fährmann
89a67c45e0 [nitter] support nitter.it (#3819) 2023-03-25 13:29:22 +01:00
Mike Fährmann
88f29a751d [nitter] skip broadcasts
instead of downloading an "Unsupported feature" HTML page
2023-03-25 13:09:24 +01:00
Mike Fährmann
1e013eba5a [nitter] fix extraction for instances without user banners 2023-03-25 12:50:40 +01:00
Mike Fährmann
d94aa1ee02 [gelbooru] fix --range for favorites (#3704) 2023-03-23 22:58:13 +01:00
Mike Fährmann
1f82b00b8f [gelbooru] fix and improve --range for pools 2023-03-23 18:22:46 +01:00
Mike Fährmann
197882cf12 [twitter] add 'hashtag' extractor (#3783) 2023-03-22 22:20:40 +01:00
Mike Fährmann
082d55de16 fix circular reference detection for -K 2023-03-21 23:46:36 +01:00
Mike Fährmann
2ab66ad899 update -K output to include quotes around keys 2023-03-21 22:28:04 +01:00
Mike Fährmann
fe41a2b159 [formatter] support putting keys in quotes
i.e. obj["key"] or obj['key']
as in f-strings
2023-03-21 22:06:54 +01:00
Mike Fährmann
46fdf46f21 [formatter] support loading an f-string from a template file
"\fTF ~/path/to/file.txt"
2023-03-20 22:05:33 +01:00
Mike Fährmann
1a4d4a799b [formatter] support filesystem paths for \fM 2023-03-20 22:01:33 +01:00
Mike Fährmann
9789ebac52 [naverwebtoon] fix extraction (#3729) 2023-03-19 17:08:58 +01:00
Mike Fährmann
72f1f16eb2 [weibo] support 'mix_media_info' entries (#3793) 2023-03-18 15:19:25 +01:00
Mike Fährmann
00f0233b28 [postprocessor:metadata] add 'skip' option (#3786) 2023-03-17 23:30:11 +01:00
Mike Fährmann
2bb937014f [twitter] fall back to legacy /media endpoint when not logged in 2023-03-17 20:54:35 +01:00
Mike Fährmann
b68094d326 [twitter] support 'note_tweet's 2023-03-17 19:36:07 +01:00
Mike Fährmann
3dcabc97ed [twitter] update API endpoints and parameters 2023-03-17 19:25:53 +01:00
Mike Fährmann
a1ca2404f9 add 'globals' instead of overwriting the default (#3773) 2023-03-16 18:37:00 +01:00
Mike Fährmann
dcb8af659a [gelbooru] extract favorites without needing cookies (#3704)
TODO: fix --range
2023-03-15 19:21:35 +01:00
Mike Fährmann
b756dc13aa [gelbooru] warn about missing cookies for favorites (#3704)
and add docstring so it shows up in --list-extractors
2023-03-15 14:58:55 +01:00
Mike Fährmann
17bd053d94 [hiperdex] fix extraction (#3768) 2023-03-15 14:28:03 +01:00
Mike Fährmann
e7898936df add link to 'Get cookies.txt LOCALLY' to README 2023-03-14 23:01:36 +01:00
Mike Fährmann
f7ce33c85c [output] set 'errors=replace' for output streams (#3765)
fixes regression from e480a933
2023-03-14 13:30:04 +01:00
Mike Fährmann
a14a2d6e59 release version 1.25.0 2023-03-11 21:05:28 +01:00
Mike Fährmann
4235d412c4 implement 'actions'
continuation of d37e7f48
but more versatile and extendable

Example:

"actions": [
    # change debug messages to info
    ["debug", "level ~info"],

    # change exit status to a non-zero value
    ["info:^No results for", "status |= 1"],

    # exit with status 2 on 429
    ["warning:429", "exit 2"],

    # restart extractor when no cookies found
    ["warning:^[Nn]o .*cookies", "restart"]
]
2023-03-10 22:08:10 +01:00
Mike Fährmann
817fc0fbd1 [nitter] remove nitter.pussthecat.org
"Shutdown"
2023-03-09 23:48:39 +01:00
Mike Fährmann
67ec91cdbd [downloader:http] change '_http_retry' to accept a Python function
and rename '_http_retry_codes' to '_http_retry'

(#3569)
2023-03-09 23:30:15 +01:00
Mike Fährmann
175822e065 merge #3738: [generic] add tests 2023-03-09 22:26:20 +01:00
Mike Fährmann
4883420e67 [generic] revert pattern change 2023-03-09 22:25:23 +01:00