gallery-dl

Author	SHA1	Message	Date
Mike Fährmann	4cdab8074e	update/fix --list-extractors	2023-09-11 17:32:59 +02:00
Mike Fährmann	a453335a9f	remove test results in extractor modules and add generic example URLs	2023-09-11 16:30:55 +02:00
Mike Fährmann	a383eca7f6	decouple extractor initialization Introduce an 'initialize()' function that does the actual init (session, cookies, config options) and can called separately from the constructor __init__(). This allows, for example, to adjust config access inside a Job before most of it already happened when calling 'extractor.find()'.	2023-07-25 22:16:16 +02:00
Mike Fährmann	7da954f810	[flickr] update default API credentials (#4332 ) and add a delay between API requests	2023-07-22 15:38:33 +02:00
Mike Fährmann	d97b8c2fba	consistent cookie-related names - rename every cookie variable or method to 'cookies_*' - simplify '.session.cookies' to just '.cookies' - more consistent 'login()' structure	2023-07-22 01:20:50 +02:00
Mike Fährmann	c45a913bfd	[flickr] add 'exif' option	2023-07-01 19:19:39 +02:00
Mike Fährmann	ccbc1a1d55	[flickr] add 'metadata' option (#4227 )	2023-06-26 16:49:48 +02:00
Mike Fährmann	d0b73fec14	[flickr] add support for secure.flickr.com (#2910 )	2022-09-14 16:19:27 +02:00
Vrihub	96fcff182c	generic extractor (#735 ) * Generic extractor, see issue #683 * Fix failed test_names test, no subcategory needed * Prefix directory_fmt with "generic" * Relax regex (would break some urls) * Flake8 compliance * pattern: don't require a scheme This fixes a bug when we force the generic extractor on urls without a scheme (that are allowed by all other extractors). * Fix using g: and r: on urls without http(s) scheme Almost all extractors accept urls without an initial http(s) scheme. Many extractors also allow for generic subdomains in their "pattern" variable; some of them implement this with the regex character class "[^.]+" (everything but a dot). This leads to a problem when the extractor is given a url starting with g: or r: (to force using the generic or recursive extractor) and without the http(s) scheme: e.g. with "r:foobar.tumblr.com" the "r:" is wrongly considered part of the subdomain. This commit fixes the bug, replacing the too generic "[^.]+" with the more specific "[\w-]+" (letters, digits and "-", the only characters allowed in domain names), which is already used by some extractors. * Relax imageurl_pattern_ext: allow relative urls * First round of small suggested changes * Support image urls starting with "//" * self.baseurl: remove trailing slash * Relax regexp (didn't catch some image urls) * Some fixes and cleanup * Fix domain pattern; option to enable extractor Fixed the domain section for "pattern", to pass "test_add" and "test_add_module" tests. Added the "enabled" configuration option (default False) to enable the generic extractor. Using "g(eneric):URL" forces using the extractor.	2021-12-29 22:39:29 +01:00
Mike Fährmann	bd08ee2859	remove most 'yield Message.Version' statements only leave them in oauth.py as noop results	2021-08-16 03:10:48 +02:00
Mike Fährmann	ca44111726	[flickr] update - ensure every photo has an 'owner' (#828) - change default directories to a more consistent schema - create directory for each photo	2020-11-15 10:44:29 +01:00
Mike Fährmann	e6cd49e78b	update extractor test results	2020-02-16 21:48:46 +01:00
Mike Fährmann	ce54b8c04c	let extractors opt-out of cookie option usage useful to avoid sending unnecessary cookies when all authentication is done through OAuth tokens	2020-01-01 21:12:37 +01:00
Mike Fährmann	abfcb356fc	[flickr] support 3k, 4k, 5k, and 6k photo sizes (closes #472 )	2019-11-10 17:52:51 +01:00
Mike Fährmann	4409d00141	embed error messages in StopExtraction exceptions	2019-10-28 16:39:49 +01:00
Mike Fährmann	20fd2d8450	[flickr] skip unavailable images/videos (fixes #398 )	2019-08-27 23:26:49 +02:00
Mike Fährmann	5499934ae2	[ngomik] fix extraction	2019-05-30 20:18:36 +02:00
Mike Fährmann	9890bfdf23	[flickr] improve code and metadata - simplify pagination - add more metadata and slightly change its structure - convert suitable values to int or list - move keys from ["photo"] to the base level - proper video support (#246) - rename method and variable names to better fit with other extractors	2019-05-14 22:10:50 +02:00
Mike Fährmann	d6ddb74cde	update test results - deviantart: 'index' is now an integer - flickr: image file with lower quality - paheal: image server name changed - rule34: post got deleted	2019-04-12 09:59:48 +02:00
Mike Fährmann	87b0929bec	Revert "[flickr] restore image quality" This reverts commit `3f513f1056`. Both live.staticflickr and farmN.staticflickr servers now produce the same image file with a lower overall quality than before this change in Flickr's end.	2019-04-11 20:31:05 +02:00
Mike Fährmann	3f513f1056	[flickr] restore image quality Flickr started serving images from live.staticflickr.com (see `ec88ff1`), but the old farmN.staticflickr.com URLs still work - at least for the time being. Filesize (and most likely quality as well) for images from live.… is severely reduced compared to images from farmN.… for non-original files, so all live URLs are replaced to point to a randomly chosen farm server.	2019-04-06 11:26:10 +02:00
Mike Fährmann	ec88ff1562	[flickr] relax unit test results Images are now randomly served from the 'live.staticflickr.com' domain instead of the "old" 'farmN.staticflickr.com' one, making it impossible to use static 'url' and 'keyword' hashes as results. Image quality doesn't appear to be effected by which image-server is used. Files from 'farmN' and 'live' are the same.	2019-03-30 18:31:59 +01:00
Mike Fährmann	5530871b5a	change results of text.nameext_from_url() Instead of getting a complete 'filename' from an URL and splitting that into 'name' and 'extension', the new approach gets rid of the complete version and renames 'name' to 'filename'. (Using anything other than {extension} for a filename extension doesn't really work anyway) Example: "https://example.org/path/filename.ext" before: - filename : filename.ext - name : filename - extension: ext now: - filename : filename - extension: ext	2019-02-14 16:07:17 +01:00
Mike Fährmann	89ee8cd7e4	filter "private" kwdict entries	2019-02-13 13:22:11 +01:00
Mike Fährmann	61741d7333	provide type information for Queue messages Child extractors are now directly constructed with Extractor.from_url() if the extractor class is known beforehand, instead of using extractor.find() and searching through all possible extractor classes.	2019-02-12 21:32:32 +01:00
Mike Fährmann	4b1880fa5e	propagate 'match' to base extractor constructor	2019-02-11 13:31:10 +01:00
Mike Fährmann	6284731107	simplify extractor constants - single strings for URL patterns - tuples instead of lists for 'directory_fmt' and 'test' - single-tuple tests where applicable	2019-02-08 13:45:40 +01:00
Mike Fährmann	34bab080ae	rewrite URL patterns to use only 1 per extractor	2019-02-08 12:03:10 +01:00
Mike Fährmann	9a98b6769d	use extractor.request for API calls (#130 ) ... at least for OAuth1.0 based APIs (flickr, smugmug, tumblr)	2018-12-04 21:29:06 +01:00
Mike Fährmann	59bb434ba5	[flickr] add ability to download all albums of a user for example with 'https://www.flickr.com/photos/shona_s/albums'	2018-11-23 09:09:37 +01:00
Mike Fährmann	8080071174	[flickr] improve album metadata (closes #109 )	2018-09-29 16:21:55 +02:00
Mike Fährmann	26cbcb3a72	[flickr] improve error handling (#109 )	2018-09-17 10:12:14 +02:00
Mike Fährmann	f3793660ef	update tests	2018-08-02 14:57:28 +02:00
Mike Fährmann	212130b048	[deviantart] improve public-private token switching - rename option to `prefer-public` - now also works for galleries with less than 24 items	2018-07-25 12:52:36 +02:00
Mike Fährmann	1c1e086d01	use common base class for OAuth1.0 based API interfaces	2018-05-10 21:57:45 +02:00
Mike Fährmann	6a31ada9e3	re-implement OAuth1.0 code OAuth support for SmugMug needs some additional features (auth-rebuild on redirect, query parameters in URL, ...) and fixing this in the old code wouldn't work all that well.	2018-05-10 18:47:05 +02:00
Mike Fährmann	d11fcf4804	smaller changes and fixes - fix the cloudflare challenge result if the last decimal places are zero (JS`s toFixed() removes trailing zeroes) - fix downloading of kissmanga chapter-pages hosted on blogspot (accessing blogspot with "kissmanga.com" as referrer yields a 401) - disable certificate validation for 'mangahere' tests - update flickr test result	2018-04-06 15:30:09 +02:00
Mike Fährmann	a112e3f2a0	[nijie] add doujin extractor adds support for "https://nijie.info/members_dojin.php?id=<artist_id>"	2018-03-31 18:17:41 +02:00
Mike Fährmann	5008e105ee	update archive IDs ... to behave in a more straightforward way when dealing with bookmarks/favourites/etc. specific IDs are now grouped by their owner, album-id, ... to allow for duplicates when it would be expected.	2018-03-01 18:20:50 +01:00
Mike Fährmann	34873dbd90	set 'archive_fmt' values These are going to be used to create an unique id for each image.	2018-02-01 15:30:49 +01:00
Mike Fährmann	19a6ae57b2	[sankaku] add pool extractor	2017-12-12 19:45:10 +01:00
Mike Fährmann	035ef655f1	[imagefap] update unit tests old gallery/image has been deleted	2017-10-27 12:22:16 +02:00
Mike Fährmann	393755ee94	[tumblr] update tests	2017-10-09 00:10:37 +02:00
Mike Fährmann	54c0715135	allow users to set their own API access_tokens/client_ids	2017-09-09 17:50:19 +02:00
Mike Fährmann	f7cdfd4c25	add a simplified version of 'parse_qs' This version only returns a dict of plain string to string key-value pairs and ignores multiple values for the same query variable.	2017-08-24 20:55:58 +02:00
Mike Fährmann	8bcf88bff7	[flickr] fix extraction This issue was only noticeable with older Python versions, as these don't exhibit a consistent ordering of dict keys.	2017-08-12 21:41:10 +02:00
Mike Fährmann	852e7acd31	[twitter] ignore "Promoted Tweets"	2017-08-06 13:43:08 +02:00
Mike Fährmann	1dac76fd1c	update extractor docstrings	2017-06-28 17:39:07 +02:00
Mike Fährmann	e1d82af5e0	small fixes	2017-06-22 18:46:42 +02:00
Mike Fährmann	719d45f89e	[flickr] allow the use of Flickr's specifiers for format selection - renamed the 'width-max' option to 'size-max' - filter by both width and height	2017-06-20 16:09:25 +02:00

1 2

64 Commits