Commit Graph

1270 Commits

Author SHA1 Message Date
Mike Fährmann
c70b21248d [wikiart] add extractors (#179)
for
- artists:          https://www.wikiart.org/en/thomas-cole
- artist-listings:  https://www.wikiart.org/en/artists-by-century/12
- artwork-listings: https://www.wikiart.org/en/paintings-by-media/grisaille
2019-04-02 17:34:57 +02:00
Mike Fährmann
0f02e85961 [reactor] use "/full/" URLs (closes #210)
Putting a "/full/" in image URLs potentially gives higher resolution
and better quality.
2019-03-30 22:14:57 +01:00
Mike Fährmann
17c11393f5 [weibo] allow user-ids in status URLs 2019-03-30 18:38:58 +01:00
Mike Fährmann
ec88ff1562 [flickr] relax unit test results
Images are now randomly served from the 'live.staticflickr.com' domain
instead of the "old" 'farmN.staticflickr.com' one, making it impossible
to use static 'url' and 'keyword' hashes as results.

Image quality doesn't appear to be effected by which image-server is
used. Files from 'farmN' and 'live' are the same.
2019-03-30 18:31:59 +01:00
Mike Fährmann
00d604cafb [luscious] fix SearchExtractor URL-pattern 2019-03-29 15:58:08 +01:00
Mike Fährmann
1384ebf907 [luscious] fix metadata extraction
- remove 'artist', 'language', and 'lang' fields
- replace 'section' with 'genre'
- provide 'tags' as list
- use GalleryExtractor as base class
2019-03-29 13:06:02 +01:00
Mike Fährmann
5398bfbd69 [exhentai] fix search and favorite extraction
removes basically all metadata, but that can be compensated for with the
right search query. writing "parsers" for all 4 possible views that have
been introduced in the latest changes is too much of a hassle ...
2019-03-28 16:22:02 +01:00
Leonardo Taccari
790b1336a6 [instagram] Add support for hashtags
Add support for hashtags (TagPage-s), i.e. explore/tags/<tag> URLs.

This also introduce a get_metadata() method in order to append
possible further metadata per-(sub)extractor.

Refactor and generalize _extract_profilepage() to _extract_page()
in order to be reused by _extract_profilepage() and _extract_tagpage()
simply by passing the type of page (`ProfilePage' or `TagPage') and picking up
the respective fields in shared data.
2019-03-24 14:05:34 +01:00
Mike Fährmann
a9bdd0f153 [instagram] fix syntax for Python 3.4
Python 3.4 doesn't like '**common' in dict literals.
This also makes '_ytdl_index' zero-based.
2019-03-24 11:25:42 +01:00
Mike Fährmann
eacebf41e4 fix typo in README 2019-03-24 11:03:02 +01:00
Leonardo Taccari
1e38f65996 [instagram] Add support for GraphSidecar media types (#201)
* [instagram] Add support for GraphSidecar media types

Refactor _extract_postpage() to always return a list of medias.

Fetch common keywords and gracefully handle GraphSidecar media type
by extracting each single media and adding `sidecar_media_id' and
`sidecar_shortcode' keywords to indicate the parent of sidecar
childrens.

While here join the copyright comment lines in a single one.

Closes #178.

* [instagram] Use `yield from' instead of `for ... yield' (thanks @mikf)!

* [instagram] Adjust filename for GraphSidecar medias

Add a possible leading `media_id' of the sidecar for GraphSidecar
media.

Thanks to @mikf for the suggestion!

* [instagram] Add extra metadata for youtube-dl in GraphSidecar childrens

GraphSidecar children ytdl: URLs when consumed by youtube-dl
redirects to the URL of their parent.  In GraphSidecar-s with
multiple GraphVideo-s this leads to downloading the same video
multiple times.

Add a `_ytdl_index' field to indicate the index of the youtube-dl
playlist corresponding the children of the sidecar.

This will be used by the `ytdl' downloader.
2019-03-24 11:02:32 +01:00
Mike Fährmann
6ba67b0537 [hypnohub] add extractors (closes #196) 2019-03-23 09:50:39 +01:00
Mike Fährmann
fe27154a10 [komikcast] fix extraction
... again
2019-03-23 09:50:39 +01:00
Mike Fährmann
5ec55ec4fc [deviantart] improve URLs for non-downloadable deviations 2019-03-21 15:37:22 +01:00
Mike Fährmann
c7a6b0ed90 [deviantart] add 'metadata' option (#189) 2019-03-21 14:49:42 +01:00
Mike Fährmann
8d96a8ce4c [500px] add user-, gallery-, and image-extractors (#185) 2019-03-20 17:32:36 +01:00
Mike Fährmann
d0f88c35be [komikcast] fix extraction 2019-03-18 11:12:19 +01:00
Mike Fährmann
6277a739e4 [35photo] add user-, genre-, and image-extractors (#162) 2019-03-18 01:11:30 +01:00
Mike Fährmann
fb14f80d62 [tumblr] fix avatar URLs for non-OAuth1.0 calls (closes #193) 2019-03-17 11:07:22 +01:00
Mike Fährmann
973a720a7a [weibo] fix unit test URL patterns 2019-03-15 15:19:39 +01:00
Mike Fährmann
a2af2d2965 adjust cache maxage values 2019-03-14 22:21:49 +01:00
Mike Fährmann
f612284d24 cache cfclearance cookies 2019-03-14 16:14:29 +01:00
Mike Fährmann
591a07f20c small code changes and cleanups 2019-03-13 22:03:02 +01:00
Mike Fährmann
6f57d44ec2 [seaotterscans] remove extractor
http://seaotterscans.com/ now redirects to their MangaDex profile
2019-03-13 22:02:45 +01:00
Mike Fährmann
6dae6bee37 automatically detect and bypass cloudflare challenge pages
TODO: cache and re-apply cfclearance cookies
2019-03-10 15:31:33 +01:00
Mike Fährmann
25aaf55514 [smugmug] improve format selection (closes #183)
- use original image if available
- support video formats
- remove user info for ImageExtractor (it is no longer possible to get
  image owner information for a single image)
2019-03-10 15:20:35 +01:00
Mike Fährmann
7c1cb923a4 [myportfolio] replace unit test
the old gallery got removed
2019-03-10 15:06:16 +01:00
Mike Fährmann
fffbfd3dce [imgspice] fix extraction 2019-03-09 20:29:23 +01:00
Mike Fährmann
4ca4631bad simplify auto-disabling certificate verification
if no certificate bundle is found
2019-03-08 16:34:01 +01:00
Mike Fährmann
09d872a2b1 generalize extractor creation code 2019-03-07 22:55:26 +01:00
Mike Fährmann
8dc6be246b [shopify] add custom retry logic for 430 status codes (#175) 2019-03-07 15:31:15 +01:00
Mike Fährmann
0887fb61f4 [komikcast] update test results 2019-03-07 14:55:52 +01:00
Mike Fährmann
976ccb267f [myportfolio] combine gallery and user extractors
An URL alone isn't good enough to distinguish between a gallery or a
gallery-listing, so the new extractor decides what to do based on the
page's content.
2019-03-06 19:45:01 +01:00
Mike Fährmann
efd104e45e [instagram] reject more non-user URLs (#180) 2019-03-06 10:26:01 +01:00
HRXN
56e0e92e0d [shopify] cosmetic changes in shopify.py (#181)
Glanced over the commits, randomly spotted some minor things.
2019-03-06 09:16:27 +01:00
Mike Fährmann
9c0e2f294b [shopify] add generic collection and product extractors (#175)
with fashionnova.com  as a default domain
2019-03-05 22:33:37 +01:00
Mike Fährmann
26c4365baa adjust metadata types for GalleryExtractors 2019-03-02 14:53:04 +01:00
Mike Fährmann
13e0f2a78f [deviantart] add 'scraps' extractor (closes #168) 2019-03-01 14:13:34 +01:00
Mike Fährmann
3ea11f5d5e [nhentai] rewrite
- use GalleryExtractor as base class
- extract a lot more metadata (artist, tags, etc.)
2019-03-01 14:13:34 +01:00
Mike Fährmann
3595cd582f use GalleryExtractor as common base class 2019-03-01 14:13:16 +01:00
Mike Fährmann
a138d5873d [hentaifoundry] improve/fix extraction
- Sometimes an ad interfered when trying to get a download URL
- Resolving "www.hentai-foundry.com" yields an invalid(?) IPv6 address
  (2607:5300:60:ca9e:feed:dead:beef:1) and urllib3 only tries to connect
  to the IPv4 variant after a rather long wait time
2019-02-25 16:16:09 +01:00
Mike Fährmann
280531c8ff [pururin] add gallery extractor (closes #174) 2019-02-25 14:54:57 +01:00
Mike Fährmann
3159dd79d5 [seiga] use HTTPS 2019-02-21 22:51:11 +01:00
Mike Fährmann
f6734142ee [komikcast] remove 'width' and 'height' info 2019-02-19 15:12:40 +01:00
Mike Fährmann
d0059cab79 [tumblr] check for null URLs (closes #165) 2019-02-19 13:49:55 +01:00
Mike Fährmann
e687a6095e [luscious] raise exception if album is not available 2019-02-19 13:30:39 +01:00
Mike Fährmann
22d3a2fcc8 [artstation] add extractor for artwork listings (#80)
like https://www.artstation.com/artwork?sorting=latest
or   https://www.artstation.com/artwork?sorting=picks
2019-02-18 12:45:44 +01:00
Mike Fährmann
937a802b49 [dynastyscans] add extractors for images and image searches
(closes #163)
2019-02-18 12:25:52 +01:00
Mike Fährmann
b09a8184ca move TestJob into test module; test _extractor values 2019-02-17 18:18:31 +01:00
Mike Fährmann
19860655a3 [weibo] add 'user' and 'status' extractors 2019-02-17 18:18:31 +01:00