Mike Fährmann
53cdfaac37
[common] add reference to 'exception' module to Extractor class
...
- remove 'exception' imports
- replace with 'self.exc'
2026-02-15 10:57:22 +01:00
Mike Fährmann
ace8c50278
[imagefap] handle '/galleries?folderid=0' URLs ( #9034 )
2026-02-10 10:56:30 +01:00
Mike Fährmann
ce8d61df66
[imagefap] don't return anything for empty profiles ( #9034 )
2026-02-10 10:28:49 +01:00
Mike Fährmann
d3c4328078
[imagefap:user] support multiple pages ( #9016 )
2026-02-08 11:49:11 +01:00
Mike Fährmann
09fbb3a594
[imagefap] use self.groups, remove __init__
2026-02-05 09:04:55 +01:00
Mike Fährmann
e006d26c8e
Revert "use f-strings when building 'pattern'"
...
revert d7c97d5a97 .
2025-12-20 22:07:37 +01:00
Mike Fährmann
968597a302
yield 3-tuples for Message.Directory
...
adapt tuples to the same length and semantics as other messages
2025-12-05 21:39:52 +01:00
Mike Fährmann
d7c97d5a97
use f-strings when building 'pattern'
2025-10-20 21:23:11 +02:00
Mike Fährmann
a097a373a9
simplify if statements by using walrus operators ( #7671 )
2025-07-22 20:57:54 +02:00
Mike Fährmann
d8ef1d693f
rename 'StopExtraction' to 'AbortExtraction'
...
for cases where StopExtraction was used to report errors
2025-07-09 21:07:28 +02:00
Mike Fährmann
9dbe33b6de
replace old %-formatted and .format(…) strings with f-strings ( #7671 )
...
mostly using flynt
https://github.com/ikamensh/flynt
2025-06-29 17:50:19 +02:00
Mike Fährmann
41191bb60a
'match.group(N)' -> 'match[N]' ( #7671 )
...
2.5x faster
2025-06-18 13:05:58 +02:00
Mike Fährmann
e08ec7e083
update copyright notices
2025-06-13 00:03:41 +02:00
Mike Fährmann
3f48e2f820
[common] add '_extract_jsonld' method ( #5272 )
2025-01-12 11:07:48 +01:00
Mike Fährmann
a09cef79c6
[imagefap] fix '{num}' in default filenames for single images
...
fixes regression introduced in 304bb4bb
2024-10-10 13:40:50 +02:00
ShruggZoltan
20f670c3b6
Reduce default numbering to 4 digits of zero-padding.
2024-07-27 01:11:50 +00:00
ShruggZoltan
db2a61ba08
Numbers images obtained from imagefap by default.
2024-07-26 06:04:10 +00:00
Herp
99c53f7fa8
Fix imagefap extrcator
2024-03-15 23:44:25 +01:00
Mike Fährmann
05331f9cf1
[imagefap] flake8, cleanup, tests
2024-03-07 01:29:19 +01:00
termvacycurtocs
f8b037ed40
[Imagefap] Add folder metadata
...
[Imagefap] Add "folder" metadata when downloading a folder or user profile.
No additional request is made to the server.
Use for example with the following configuration :
"parent-metadata": true
"directory":["{category}", "{uploader}", "{folder}", "{gallery_id} {title}"]
2024-03-02 22:15:45 +01:00
Mike Fährmann
d119507037
[imagefap] fix single image resolution
...
Downloading from a single image page like
https://www.imagefap.com/photo/123456789/
returned only the thumbnail URL.
2023-11-26 00:30:52 +01:00
Mike Fährmann
3ecb512722
send Referer headers by default
2023-09-19 00:02:04 +02:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
a996d936d2
[imagefap] fix pagination ( #3013 )
2023-07-18 17:56:33 +02:00
Mike Fährmann
2dfd4a3de2
[imagefap] extract 'categories' metadata and fix empty 'tags'
2023-04-17 14:49:50 +02:00
Mike Fährmann
02ec5bb8e5
[imagefap] extract 'description' metadata ( #3905 )
2023-04-16 17:02:16 +02:00
Mike Fährmann
dd884b02ee
replace json.loads with direct calls to JSONDecoder.decode
2023-02-09 15:22:00 +01:00
Mike Fährmann
137a395ae0
[imagefap] fix infinite pagination loop ( #3594 )
2023-01-31 19:21:43 +01:00
Mike Fährmann
3c708ade8f
[imagefap] fix metadata extraction
2023-01-31 15:38:55 +01:00
Mike Fährmann
17e24eacf0
[imagefap] update 'gallery' URLs ( #3595 )
2023-01-31 15:33:35 +01:00
Mike Fährmann
4833ec323e
[imagefap] add 'folder' extractor ( #3504 )
2023-01-08 16:57:31 +01:00
Mike Fährmann
cbaeee9533
[imagefap] warn about redirects to '/human-verification' ( #1140 )
2023-01-07 13:04:42 +01:00
Mike Fährmann
435de1329a
[imagefap] use default delay between requests ( #1140 )
2023-01-07 12:59:09 +01:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
bc9d291c13
[imagefap] fix and improve folder extraction ( #3013 )
2022-10-08 15:41:39 +02:00
Mike Fährmann
55fca5fe4b
[imagefap] fix and improve gallery pagination ( #3013 )
2022-10-08 15:41:39 +02:00
Mike Fährmann
c6a9bab019
update extractor test results
2022-07-12 15:49:22 +02:00
Mike Fährmann
47a780942c
update extractor test results
2021-09-03 19:36:12 +02:00
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
2ecf1efb16
update extractor test results
...
- tumblr: remove deleted post
- jaiminisbox: replace removed manga/chapters
- smugmug: one inconsequential field got removed
2020-07-18 15:12:28 +02:00
Mike Fährmann
1afb91363c
[imagefap] generalize URL patterns and add tests ( #552 )
2020-01-02 14:26:18 +01:00
Xope Totec
f701e9f33a
Handle beta.imagefap.com URLs ( #552 )
2020-01-02 14:22:00 +01:00
Mike Fährmann
dcaa3d01bd
[imagefap] adapt to new image URL format
2019-11-30 23:48:02 +01:00
Mike Fährmann
108963d138
[imagefap] include Referer headers
2019-06-24 21:31:29 +02:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
61741d7333
provide type information for Queue messages
...
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
2019-02-12 21:32:32 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00