Commit Graph

1503 Commits

Author SHA1 Message Date
Mike Fährmann
5476404a5c update and fix Cloudflare bypass 2019-03-25 22:53:36 +01:00
Leonardo Taccari
790b1336a6 [instagram] Add support for hashtags
Add support for hashtags (TagPage-s), i.e. explore/tags/<tag> URLs.

This also introduce a get_metadata() method in order to append
possible further metadata per-(sub)extractor.

Refactor and generalize _extract_profilepage() to _extract_page()
in order to be reused by _extract_profilepage() and _extract_tagpage()
simply by passing the type of page (`ProfilePage' or `TagPage') and picking up
the respective fields in shared data.
2019-03-24 14:05:34 +01:00
Mike Fährmann
114b8eecc5 [downloader;ytdl] utilize '_ytdl_index' metadata fields 2019-03-24 11:27:20 +01:00
Mike Fährmann
a9bdd0f153 [instagram] fix syntax for Python 3.4
Python 3.4 doesn't like '**common' in dict literals.
This also makes '_ytdl_index' zero-based.
2019-03-24 11:25:42 +01:00
Mike Fährmann
eacebf41e4 fix typo in README 2019-03-24 11:03:02 +01:00
Leonardo Taccari
1e38f65996 [instagram] Add support for GraphSidecar media types (#201)
* [instagram] Add support for GraphSidecar media types

Refactor _extract_postpage() to always return a list of medias.

Fetch common keywords and gracefully handle GraphSidecar media type
by extracting each single media and adding `sidecar_media_id' and
`sidecar_shortcode' keywords to indicate the parent of sidecar
childrens.

While here join the copyright comment lines in a single one.

Closes #178.

* [instagram] Use `yield from' instead of `for ... yield' (thanks @mikf)!

* [instagram] Adjust filename for GraphSidecar medias

Add a possible leading `media_id' of the sidecar for GraphSidecar
media.

Thanks to @mikf for the suggestion!

* [instagram] Add extra metadata for youtube-dl in GraphSidecar childrens

GraphSidecar children ytdl: URLs when consumed by youtube-dl
redirects to the URL of their parent.  In GraphSidecar-s with
multiple GraphVideo-s this leads to downloading the same video
multiple times.

Add a `_ytdl_index' field to indicate the index of the youtube-dl
playlist corresponding the children of the sidecar.

This will be used by the `ytdl' downloader.
2019-03-24 11:02:32 +01:00
Mike Fährmann
e7d0d98c88 improve FFmpeg arguments for --ugoira-conv 2019-03-23 09:50:39 +01:00
Mike Fährmann
6ba67b0537 [hypnohub] add extractors (closes #196) 2019-03-23 09:50:39 +01:00
Mike Fährmann
fe27154a10 [komikcast] fix extraction
... again
2019-03-23 09:50:39 +01:00
Mike Fährmann
5ec55ec4fc [deviantart] improve URLs for non-downloadable deviations 2019-03-21 15:37:22 +01:00
Mike Fährmann
c7a6b0ed90 [deviantart] add 'metadata' option (#189) 2019-03-21 14:49:42 +01:00
Mike Fährmann
8d96a8ce4c [500px] add user-, gallery-, and image-extractors (#185) 2019-03-20 17:32:36 +01:00
Mike Fährmann
d0f88c35be [komikcast] fix extraction 2019-03-18 11:12:19 +01:00
Mike Fährmann
6277a739e4 [35photo] add user-, genre-, and image-extractors (#162) 2019-03-18 01:11:30 +01:00
Mike Fährmann
fb14f80d62 [tumblr] fix avatar URLs for non-OAuth1.0 calls (closes #193) 2019-03-17 11:07:22 +01:00
Mike Fährmann
8c20443839 release version 1.8.0 2019-03-15 15:27:11 +01:00
Mike Fährmann
973a720a7a [weibo] fix unit test URL patterns 2019-03-15 15:19:39 +01:00
Mike Fährmann
a2af2d2965 adjust cache maxage values 2019-03-14 22:21:49 +01:00
Mike Fährmann
f612284d24 cache cfclearance cookies 2019-03-14 16:14:29 +01:00
Mike Fährmann
34ea0d6a10 rewrite cache module
less complexity, better performance,
but some duplicate code here and there
2019-03-14 15:55:48 +01:00
Mike Fährmann
12482553bd update links to youtube-dl 2019-03-13 22:03:02 +01:00
Mike Fährmann
591a07f20c small code changes and cleanups 2019-03-13 22:03:02 +01:00
Mike Fährmann
6f57d44ec2 [seaotterscans] remove extractor
http://seaotterscans.com/ now redirects to their MangaDex profile
2019-03-13 22:02:45 +01:00
Mike Fährmann
6dae6bee37 automatically detect and bypass cloudflare challenge pages
TODO: cache and re-apply cfclearance cookies
2019-03-10 15:31:33 +01:00
Mike Fährmann
25aaf55514 [smugmug] improve format selection (closes #183)
- use original image if available
- support video formats
- remove user info for ImageExtractor (it is no longer possible to get
  image owner information for a single image)
2019-03-10 15:20:35 +01:00
Mike Fährmann
7c1cb923a4 [myportfolio] replace unit test
the old gallery got removed
2019-03-10 15:06:16 +01:00
Mike Fährmann
fffbfd3dce [imgspice] fix extraction 2019-03-09 20:29:23 +01:00
Mike Fährmann
4ca4631bad simplify auto-disabling certificate verification
if no certificate bundle is found
2019-03-08 16:34:01 +01:00
Mike Fährmann
09d872a2b1 generalize extractor creation code 2019-03-07 22:55:26 +01:00
Mike Fährmann
8dc6be246b [shopify] add custom retry logic for 430 status codes (#175) 2019-03-07 15:31:15 +01:00
Mike Fährmann
0887fb61f4 [komikcast] update test results 2019-03-07 14:55:52 +01:00
Mike Fährmann
976ccb267f [myportfolio] combine gallery and user extractors
An URL alone isn't good enough to distinguish between a gallery or a
gallery-listing, so the new extractor decides what to do based on the
page's content.
2019-03-06 19:45:01 +01:00
Mike Fährmann
efd104e45e [instagram] reject more non-user URLs (#180) 2019-03-06 10:26:01 +01:00
HRXN
56e0e92e0d [shopify] cosmetic changes in shopify.py (#181)
Glanced over the commits, randomly spotted some minor things.
2019-03-06 09:16:27 +01:00
Mike Fährmann
23baecb29e fix 'CONVERSIONS' variable name 2019-03-05 22:50:56 +01:00
Mike Fährmann
9c0e2f294b [shopify] add generic collection and product extractors (#175)
with fashionnova.com  as a default domain
2019-03-05 22:33:37 +01:00
Mike Fährmann
105097ddcf add 'S' conversion options for format string fields
Same as 's' (convert to string), but has a better, human-readable
conversion for lists.
2019-03-04 21:13:34 +01:00
Mike Fährmann
1578013efc remove unused default config path 2019-03-04 20:53:58 +01:00
Mike Fährmann
26c4365baa adjust metadata types for GalleryExtractors 2019-03-02 14:53:04 +01:00
Mike Fährmann
13e0f2a78f [deviantart] add 'scraps' extractor (closes #168) 2019-03-01 14:13:34 +01:00
Mike Fährmann
3ea11f5d5e [nhentai] rewrite
- use GalleryExtractor as base class
- extract a lot more metadata (artist, tags, etc.)
2019-03-01 14:13:34 +01:00
Mike Fährmann
176b7253a1 update function signature for config.load() 2019-03-01 14:13:34 +01:00
Mike Fährmann
3595cd582f use GalleryExtractor as common base class 2019-03-01 14:13:16 +01:00
Mike Fährmann
a138d5873d [hentaifoundry] improve/fix extraction
- Sometimes an ad interfered when trying to get a download URL
- Resolving "www.hentai-foundry.com" yields an invalid(?) IPv6 address
  (2607:5300:60:ca9e:feed:dead:beef:1) and urllib3 only tries to connect
  to the IPv4 variant after a rather long wait time
2019-02-25 16:16:09 +01:00
Mike Fährmann
280531c8ff [pururin] add gallery extractor (closes #174) 2019-02-25 14:54:57 +01:00
Mike Fährmann
3159dd79d5 [seiga] use HTTPS 2019-02-21 22:51:11 +01:00
Mike Fährmann
f6734142ee [komikcast] remove 'width' and 'height' info 2019-02-19 15:12:40 +01:00
Mike Fährmann
d0059cab79 [tumblr] check for null URLs (closes #165) 2019-02-19 13:49:55 +01:00
Mike Fährmann
e687a6095e [luscious] raise exception if album is not available 2019-02-19 13:30:39 +01:00
Mike Fährmann
22d3a2fcc8 [artstation] add extractor for artwork listings (#80)
like https://www.artstation.com/artwork?sorting=latest
or   https://www.artstation.com/artwork?sorting=picks
2019-02-18 12:45:44 +01:00