* [instagram] Add extractor for instagram.com user profiles and pages
The extractor scrapes `instagram.com/<user>' timelines and
`instagram.com/p/<shortcode>' by mimicking the behaviour of a web
browser and extracting the sharedData JSON of the single pages.
Please note that this mean that for user timelines we also do an
extra request to the `instagram.com/p/<shortcode>' page but this
permit to have consistent (and all) information about the media
fetched.
The MD5 logic used for X-Instagram-GIS was documented in
<https://stackoverflow.com/questions/49786980/>
* [instagram] Test for keywords, not url for GraphImage and GraphSidecar
URLs returned by instagram seems not stable so avoid testing for
them and instead test for keyword returned.
* [instagram] Improve test of InstagramProfilepageExtractor
Also check the count of media returned.
* [instagram] Several cleanup and improvements
- Change description, subcategories to generate a better description in
docs/supportedsite.rst
- Remove not needed InstagramExtractor.__init__()
- Use text.parse_int() instead of directly using int() (the former is more
robust)
- Use self.request().json() instead of using json.loads() the
self.request().text()
- Add `pattern:' to check the URLs where we do not have a stable URLs.
It seems that only the subdomain is not stable.
Thanks to @mikf!
While a filename might not be a real 'hash', or comparable to what
tumbler usually provides, it is still better than an empty string.
At least as long as "alternatives" in format strings aren't implemented.
The "default" downloader options (rate, retries, timeout, verify) are
mapped to corresponding youtube-dl options.
downloader.ytdl.logging tells the downloader to pass youtube-dl's output
to a Logger object.
downloader.ytdl.raw-options allows to pass arbitrary options to the
YoutubeDL constructor.
from: 5xx HTTP Error: Reason
to : 5xx: Reason
The "HTTP Error" part was in there to emulate Request's error messages
from response.raise_for_status(), but it reads a lot better without.
In addition to 'abort' and 'exit', it is now possible to specify
'abort:N' and 'exit:N' (where N is any integer) as value for 'skip'
to abort/exit after consecutively skipping N downloads.