gallery-dl

Author	SHA1	Message	Date
Mike Fährmann	53cdfaac37	[common] add reference to 'exception' module to Extractor class - remove 'exception' imports - replace with 'self.exc'	2026-02-15 10:57:22 +01:00
Mike Fährmann	e006d26c8e	Revert "use f-strings when building 'pattern'" revert `d7c97d5a97`.	2025-12-20 22:07:37 +01:00
Mike Fährmann	968597a302	yield 3-tuples for Message.Directory adapt tuples to the same length and semantics as other messages	2025-12-05 21:39:52 +01:00
Mike Fährmann	d7c97d5a97	use f-strings when building 'pattern'	2025-10-20 21:23:11 +02:00
Mike Fährmann	d8ef1d693f	rename 'StopExtraction' to 'AbortExtraction' for cases where StopExtraction was used to report errors	2025-07-09 21:07:28 +02:00
Mike Fährmann	9dbe33b6de	replace old %-formatted and .format(…) strings with f-strings (#7671 ) mostly using flynt https://github.com/ikamensh/flynt	2025-06-29 17:50:19 +02:00
Mike Fährmann	41191bb60a	'match.group(N)' -> 'match[N]' (#7671 ) 2.5x faster	2025-06-18 13:05:58 +02:00
Mike Fährmann	e08ec7e083	update copyright notices	2025-06-13 00:03:41 +02:00
Mike Fährmann	811b665e33	remove @staticmethod decorators There might have been a time when calling a static method was faster than a regular method, but that is no longer the case. According to micro-benchmarks, it is 70% slower in CPython 3.13 and it also makes executing the code of a class definition slower.	2025-06-12 22:50:52 +02:00
Mike Fährmann	7916c8bf77	allow passing cookies to OAuth extractors partially revert `ce54b8c04c`	2024-11-09 18:06:27 +01:00
Mike Fährmann	a453335a9f	remove test results in extractor modules and add generic example URLs	2023-09-11 16:30:55 +02:00
Mike Fährmann	a383eca7f6	decouple extractor initialization Introduce an 'initialize()' function that does the actual init (session, cookies, config options) and can called separately from the constructor __init__(). This allows, for example, to adjust config access inside a Job before most of it already happened when calling 'extractor.find()'.	2023-07-25 22:16:16 +02:00
Mike Fährmann	d97b8c2fba	consistent cookie-related names - rename every cookie variable or method to 'cookies_*' - simplify '.session.cookies' to just '.cookies' - more consistent 'login()' structure	2023-07-22 01:20:50 +02:00
Mike Fährmann	cd931e1139	update extractor test results	2022-12-08 18:58:29 +01:00
Mike Fährmann	daef91c925	[smugmug] update default API credentials (#2881 ) The old key lacked v2 access and I'm unable to accept the new terms of service since my old account got deleted	2022-08-31 10:28:25 +02:00
Mike Fährmann	da11fb32d0	update extractor test results	2022-08-28 00:16:12 +02:00
Mike Fährmann	c6a9bab019	update extractor test results	2022-07-12 15:49:22 +02:00
Vrihub	96fcff182c	generic extractor (#735 ) * Generic extractor, see issue #683 * Fix failed test_names test, no subcategory needed * Prefix directory_fmt with "generic" * Relax regex (would break some urls) * Flake8 compliance * pattern: don't require a scheme This fixes a bug when we force the generic extractor on urls without a scheme (that are allowed by all other extractors). * Fix using g: and r: on urls without http(s) scheme Almost all extractors accept urls without an initial http(s) scheme. Many extractors also allow for generic subdomains in their "pattern" variable; some of them implement this with the regex character class "[^.]+" (everything but a dot). This leads to a problem when the extractor is given a url starting with g: or r: (to force using the generic or recursive extractor) and without the http(s) scheme: e.g. with "r:foobar.tumblr.com" the "r:" is wrongly considered part of the subdomain. This commit fixes the bug, replacing the too generic "[^.]+" with the more specific "[\w-]+" (letters, digits and "-", the only characters allowed in domain names), which is already used by some extractors. * Relax imageurl_pattern_ext: allow relative urls * First round of small suggested changes * Support image urls starting with "//" * self.baseurl: remove trailing slash * Relax regexp (didn't catch some image urls) * Some fixes and cleanup * Fix domain pattern; option to enable extractor Fixed the domain section for "pattern", to pass "test_add" and "test_add_module" tests. Added the "enabled" configuration option (default False) to enable the generic extractor. Using "g(eneric):URL" forces using the extractor.	2021-12-29 22:39:29 +01:00
Mike Fährmann	211de95dd0	update extractor test results	2021-11-01 02:58:53 +01:00
Mike Fährmann	bd08ee2859	remove most 'yield Message.Version' statements only leave them in oauth.py as noop results	2021-08-16 03:10:48 +02:00
Mike Fährmann	bdfcc9c4b1	update extractor test results	2021-04-18 20:28:15 +02:00
Mike Fährmann	968d3e8465	remove '&' from URL patterns '/?&#' -> '/?#' and '?&#' -> '?#' According to https://www.ietf.org/rfc/rfc3986.txt, URLs are "organized hierarchically" by using "the slash ("/"), question mark ("?"), and number sign ("#") characters to delimit components"	2020-10-22 23:31:25 +02:00
Mike Fährmann	19bf76bcf8	update extractor test results	2020-08-03 21:57:00 +02:00
Mike Fährmann	2ecf1efb16	update extractor test results - tumblr: remove deleted post - jaiminisbox: replace removed manga/chapters - smugmug: one inconsequential field got removed	2020-07-18 15:12:28 +02:00
Mike Fährmann	ce54b8c04c	let extractors opt-out of cookie option usage useful to avoid sending unnecessary cookies when all authentication is done through OAuth tokens	2020-01-01 21:12:37 +01:00
Mike Fährmann	4ca883c66f	[smugmug] replace test for custom URLs The old one (http://www.creativedogportraits.com/) is empty and/or no longer handled by SmugMug.	2019-11-22 23:25:55 +01:00
Mike Fährmann	4409d00141	embed error messages in StopExtraction exceptions	2019-10-28 16:39:49 +01:00
Mike Fährmann	1133b7fcbd	[smugmug] update unit tests The account used for tests before has been deleted.	2019-07-19 17:16:24 +02:00
Mike Fährmann	48233f00c0	[readcomiconline] detect 'AreYouHuman' redirects (#279 )	2019-05-26 15:58:37 +02:00
Mike Fährmann	25aaf55514	[smugmug] improve format selection (closes #183 ) - use original image if available - support video formats - remove user info for ImageExtractor (it is no longer possible to get image owner information for a single image)	2019-03-10 15:20:35 +01:00
Mike Fährmann	5530871b5a	change results of text.nameext_from_url() Instead of getting a complete 'filename' from an URL and splitting that into 'name' and 'extension', the new approach gets rid of the complete version and renames 'name' to 'filename'. (Using anything other than {extension} for a filename extension doesn't really work anyway) Example: "https://example.org/path/filename.ext" before: - filename : filename.ext - name : filename - extension: ext now: - filename : filename - extension: ext	2019-02-14 16:07:17 +01:00
Mike Fährmann	61741d7333	provide type information for Queue messages Child extractors are now directly constructed with Extractor.from_url() if the extractor class is known beforehand, instead of using extractor.find() and searching through all possible extractor classes.	2019-02-12 21:32:32 +01:00
Mike Fährmann	4b1880fa5e	propagate 'match' to base extractor constructor	2019-02-11 13:31:10 +01:00
Mike Fährmann	6284731107	simplify extractor constants - single strings for URL patterns - tuples instead of lists for 'directory_fmt' and 'test' - single-tuple tests where applicable	2019-02-08 13:45:40 +01:00
Mike Fährmann	751e535948	[nhentai] fix extraction (closes #156 ) Use JSON embedded in webpage since API endpoints have been disabled	2019-01-14 07:57:50 +01:00
Mike Fährmann	9a98b6769d	use extractor.request for API calls (#130 ) ... at least for OAuth1.0 based APIs (flickr, smugmug, tumblr)	2018-12-04 21:29:06 +01:00
Mike Fährmann	7f6a0be982	adjust some tests	2018-11-15 22:50:04 +01:00
Mike Fährmann	e1d306cc48	update unit test results	2018-10-13 16:54:30 +02:00
Mike Fährmann	0bc8ef51c8	[smugmug] Handle albums with no explicit owner (#100 )	2018-09-01 12:55:02 +02:00
Mike Fährmann	e9dd2eff1d	[twitter] add extractor for media-tweet timelines (#96 ) For example "https://twitter.com/PicturesEarth/media". They are different from normal timelines in that they do not contain any (re)tweets from other users and feature all media the user ever posted, including responses to other tweets.	2018-08-19 20:46:12 +02:00
Mike Fährmann	7a98cc9798	[smugmug] update tests My test account expired and all uploaded images got deleted.	2018-06-22 15:04:31 +02:00
Mike Fährmann	1c1e086d01	use common base class for OAuth1.0 based API interfaces	2018-05-10 21:57:45 +02:00
Mike Fährmann	6a31ada9e3	re-implement OAuth1.0 code OAuth support for SmugMug needs some additional features (auth-rebuild on redirect, query parameters in URL, ...) and fixing this in the old code wouldn't work all that well.	2018-05-10 18:47:05 +02:00
Mike Fährmann	3ce5296313	[smugmug] code cleanup - combine User and Node extractors - (re)move miscellaneous helper functions - rename "Owner" to "User"	2018-05-03 14:12:10 +02:00
Mike Fährmann	42ed7667b8	[smugmug] support user- and general album URLs	2018-05-02 20:34:45 +02:00
Mike Fährmann	2ea0d1da42	[smugmug] improve API code; use data expansions	2018-04-30 18:22:44 +02:00
Mike Fährmann	16e014baaa	[smugmug] added image and album extractor just some initial code that still requires a lot of work ... TODO: - folders - old-style albums (which are nearly all of them ...) - images from users - OAuth It could also happen that the API credentials used will become invalid whenever my 14 day trial period ends (7 days remaining), but that would just require users to supply their own.	2018-04-29 21:27:25 +02:00

47 Commits