94 Commits

Author SHA1 Message Date
Mike Fährmann
53cdfaac37 [common] add reference to 'exception' module to Extractor class
- remove 'exception' imports
- replace with 'self.exc'
2026-02-15 10:57:22 +01:00
Mike Fährmann
00c6821a3f replace 2-element f-strings with simple '+' concatenations
Python's 'ast' module and its 'NodeVisitor' class
were incredibly helpful in identifying these
2025-12-22 11:26:04 +01:00
Mike Fährmann
e006d26c8e Revert "use f-strings when building 'pattern'"
revert d7c97d5a97.
2025-12-20 22:07:37 +01:00
Mike Fährmann
968597a302 yield 3-tuples for Message.Directory
adapt tuples to the same length and semantics as other messages
2025-12-05 21:39:52 +01:00
Mike Fährmann
2578f7b5c1 [flickr] extract public API key from website (#7564 #7649 #7700 #8553)
this breaks 'oauth:flickr' with the default key,
but it allows downloading without custom key / Flickr Pro
2025-11-14 11:38:28 +01:00
Mike Fährmann
d7c97d5a97 use f-strings when building 'pattern' 2025-10-20 21:23:11 +02:00
Mike Fährmann
c8fc790028 merge branch 'dt': move datetime utils into separate module
- use 'datetime.fromisoformat()' when possible (#7671)
- return a datetime-compatible object for invalid datetimes
  (instead of a 'str' value)
2025-10-20 09:30:05 +02:00
Mike Fährmann
085616e0a8 [dt] replace 'text.parse_datetime()' & 'text.parse_timestamp()' 2025-10-17 17:43:06 +02:00
Mike Fährmann
8c62be343e [output] add 'Logger.traceback()' helper 2025-10-14 18:44:29 +02:00
Mike Fährmann
a097a373a9 simplify if statements by using walrus operators (#7671) 2025-07-22 20:57:54 +02:00
Mike Fährmann
d8ef1d693f rename 'StopExtraction' to 'AbortExtraction'
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
Mike Fährmann
9dbe33b6de replace old %-formatted and .format(…) strings with f-strings (#7671)
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
e08ec7e083 update copyright notices 2025-06-13 00:03:41 +02:00
Mike Fährmann
811b665e33 remove @staticmethod decorators
There might have been a time when calling a static method was faster
than a regular method, but that is no longer the case. According to
micro-benchmarks, it is 70% slower in CPython 3.13 and it also makes
executing the code of a class definition slower.
2025-06-12 22:50:52 +02:00
Mike Fährmann
0285473b04 [flickr] add 'profile' option 2025-05-13 11:47:34 +02:00
Mike Fährmann
204f1c5f92 [flickr] fix overwriting 'owner'/'user' data when 'info' is enabled 2025-05-13 11:11:03 +02:00
Mike Fährmann
6b84de6cf7 [flickr] add 'info' option (#4720 #6817) 2025-05-12 17:07:36 +02:00
Mike Fährmann
bd7fcdab4c [flickr] provide human-readable 'license_name' metadata 2025-05-12 16:44:22 +02:00
Mike Fährmann
cf6eff7ff7 [flickr] remove constructors 2025-05-12 16:27:14 +02:00
Mike Fährmann
b62c466c14 [flickr] fix video download URLs (#6464)
continuation of 0e18fa395d
fix video detection in '_file_url'
2024-11-13 20:56:37 +01:00
Mike Fährmann
7916c8bf77 allow passing cookies to OAuth extractors
partially revert ce54b8c04c
2024-11-09 18:06:27 +01:00
Mike Fährmann
0e18fa395d [flickr] use "download" URLs (#6360) 2024-11-09 17:33:27 +01:00
Mike Fährmann
7c43f9e152 [flickr] update default API credentials (#6300) 2024-10-10 11:50:34 +02:00
Mike Fährmann
9a0acbe7c4 [flickr] remove debug remains (#6252)
fixes regression introduced in a051e1c9
2024-09-29 13:01:51 +02:00
Mike Fährmann
a051e1c955 directly pass exception instances as 'exc_info' logger argument 2024-09-19 14:50:08 +02:00
Mike Fährmann
58113b73d1 [flickr] make album metadata extraction non-fatal (#3441)
https://github.com/mikf/gallery-dl/issues/3441#issuecomment-2313679156
2024-08-30 10:24:03 +02:00
Pedro Cunha
dcd44cf423 [flickr] reference the correct function 2024-08-22 17:00:35 +01:00
Mike Fährmann
e92a9ae343 [flickr] make exif and context metadata extraction non-fatal (#6002) 2024-08-14 09:44:04 +02:00
Mike Fährmann
5c1f5861b6 [flickr] add 'contexts' option (#5324) 2024-03-18 17:36:16 +01:00
Mike Fährmann
6e928300bc [flickr] handle non-JSON errors (#5131) 2024-02-06 21:22:10 +01:00
Mike Fährmann
4cdab8074e update/fix --list-extractors 2023-09-11 17:32:59 +02:00
Mike Fährmann
a453335a9f remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6 decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
7da954f810 [flickr] update default API credentials (#4332)
and add a delay between API requests
2023-07-22 15:38:33 +02:00
Mike Fährmann
d97b8c2fba consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
c45a913bfd [flickr] add 'exif' option 2023-07-01 19:19:39 +02:00
Mike Fährmann
ccbc1a1d55 [flickr] add 'metadata' option (#4227) 2023-06-26 16:49:48 +02:00
Mike Fährmann
d0b73fec14 [flickr] add support for secure.flickr.com (#2910) 2022-09-14 16:19:27 +02:00
Vrihub
96fcff182c generic extractor (#735)
* Generic extractor, see issue #683

* Fix failed test_names test, no subcategory needed

* Prefix directory_fmt with "generic"

* Relax regex (would break some urls)

* Flake8 compliance

* pattern: don't require a scheme

This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).

* Fix using g: and r: on urls without http(s) scheme

Almost all extractors accept urls without an initial http(s) scheme.

Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).

This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.

This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.

* Relax imageurl_pattern_ext: allow relative urls

* First round of small suggested changes

* Support image urls starting with "//"

* self.baseurl: remove trailing slash

* Relax regexp (didn't catch some image urls)

* Some fixes and cleanup

* Fix domain pattern; option to enable extractor

Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
2021-12-29 22:39:29 +01:00
Mike Fährmann
bd08ee2859 remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
ca44111726 [flickr] update
- ensure every photo has an 'owner' (#828)
- change default directories to a more consistent schema
- create directory for each photo
2020-11-15 10:44:29 +01:00
Mike Fährmann
e6cd49e78b update extractor test results 2020-02-16 21:48:46 +01:00
Mike Fährmann
ce54b8c04c let extractors opt-out of cookie option usage
useful to avoid sending unnecessary cookies when all authentication
is done through OAuth tokens
2020-01-01 21:12:37 +01:00
Mike Fährmann
abfcb356fc [flickr] support 3k, 4k, 5k, and 6k photo sizes (closes #472) 2019-11-10 17:52:51 +01:00
Mike Fährmann
4409d00141 embed error messages in StopExtraction exceptions 2019-10-28 16:39:49 +01:00
Mike Fährmann
20fd2d8450 [flickr] skip unavailable images/videos (fixes #398) 2019-08-27 23:26:49 +02:00
Mike Fährmann
5499934ae2 [ngomik] fix extraction 2019-05-30 20:18:36 +02:00
Mike Fährmann
9890bfdf23 [flickr] improve code and metadata
- simplify pagination
- add more metadata and slightly change its structure
  - convert suitable values to int or list
  - move keys from ["photo"] to the base level
- proper video support (#246)
- rename method and variable names to better fit with other extractors
2019-05-14 22:10:50 +02:00
Mike Fährmann
d6ddb74cde update test results
- deviantart: 'index' is now an integer
- flickr: image file with lower quality
- paheal: image server name changed
- rule34: post got deleted
2019-04-12 09:59:48 +02:00
Mike Fährmann
87b0929bec Revert "[flickr] restore image quality"
This reverts commit 3f513f1056.

Both live.staticflickr and farmN.staticflickr servers now produce the
same image file with a lower overall quality than before this change in
Flickr's end.
2019-04-11 20:31:05 +02:00