Commit Graph

928 Commits

Author SHA1 Message Date
Mike Fährmann
f45c9f2141 [gfycat] test-updates and code-adjustments 2018-08-18 23:04:45 +02:00
Mike Fährmann
9b1c39032c [twitter] changes and improvements
- rename User- to TimelineExtractor
- rename 'userid' to 'user_id' to conform to the other ..._id values
- adjust archive_fmt to deal with retweets
- emulate browser behavior for API calls
2018-08-18 23:04:45 +02:00
Mike Fährmann
10365394d7 [twitter] add support for user-timelines (closes #96)
also adds a 'retweets' option to filter retweeted content
2018-08-17 20:04:11 +02:00
Mike Fährmann
d3f1eed2a6 [pinterest] improvements
- add stop condition for pin-related pins
- improve URL patterns
- make Pylint happy
2018-08-16 18:11:39 +02:00
Mike Fährmann
2801a0d997 [exhentai] skip "Content Warning" page when not logged in
(closes #97)
2018-08-16 09:17:22 +02:00
Mike Fährmann
63fa0b2006 [pinterest] add extractors for related pins
Related pins can not be accessed by adding a "#related" fragment
to the end of a Pinterest URL, for example:
- https://www.pinterest.com/pin/858146903966145189/#related
- https://www.pinterest.com/g1952849/test-/#related

There are no explicit real URLs for related pins,
using an option to enable them results in "clunky" code,
and a custom "related:<URL>" scheme doesn't feel right either.
2018-08-15 21:49:45 +02:00
Mike Fährmann
1694039de0 [komikcast] update ad-filter 2018-08-15 21:49:44 +02:00
Mike Fährmann
a74591b84b [tumblr] remove "original image" functionality
Accessing higher/original quality images on
https://s3.amazonaws.com/data.tumblr.com and http://data.tumblr.com
is no longer possible and any HTTP request results in 403 Forbidden.

A few images can still be accessed through https//a.tumblr.com [1][2],
but not as "_raw", just "_1280", and that might also be "fixed" in
the near future.

[1] https://a.tumblr.com/tumblr_kzjlfiTnfe1qz4rgho1_1280.jpg
[2] https://a.tumblr.com/ee589c6345f29d2d5935cecb49b0a705/tumblr_oztu02dIHp1wgha4yo1_1280.png
2018-08-14 11:51:17 +02:00
Mike Fährmann
38d4f43cc0 [komikcast] skip ads 2018-08-14 11:17:59 +02:00
Mike Fährmann
4313c95bc9 improve error message for OAuth2 authentication 2018-08-11 23:54:25 +02:00
Mike Fährmann
b55e39d1ee [mangadex] improve extraction
- cache manga API results
- add artist, author and date fields to chapter metadata
- remove Manga-/ChapterExtractor inheritance
- minor code simplifications and improvements
2018-08-10 16:50:07 +02:00
Mike Fährmann
b1c4c1e13c [mangadex] fix extraction 2018-08-08 18:08:26 +02:00
Mike Fährmann
3c90df6635 [piczel] add user, folder and image extractors 2018-08-08 10:53:01 +02:00
Mike Fährmann
2a9f3341a2 [behance] fix title extraction 2018-08-08 10:48:58 +02:00
Mike Fährmann
3fc2f269fa [behance] filter 'fields' list 2018-08-07 12:14:41 +02:00
Mike Fährmann
b67339155f [rule34] update test results
'metadata' tag type has been removed
2018-08-07 12:13:34 +02:00
Mike Fährmann
a86f2bfc80 [pinterest] update not-found redirects 2018-08-07 12:13:19 +02:00
Mike Fährmann
b040ca0718 [rule34] small unit test fixes 2018-08-03 17:28:47 +02:00
Mike Fährmann
b164231bca [sankaku] increase default values for 'wait-min/-max' 2018-08-03 17:06:51 +02:00
Mike Fährmann
68d6033a5d use 'retries' and 'timeout' options for regular HTTP requests 2018-08-02 16:11:54 +02:00
Mike Fährmann
f3793660ef update tests 2018-08-02 14:57:28 +02:00
Mike Fährmann
df082e923c [behance] add gallery extractor (#95) 2018-08-01 21:46:55 +02:00
Mike Fährmann
5f27cfeff6 [deviantart] remove prefer-public option
All API requests now always use a public token and only switch to
a private token for pagination results if `refresh-token` is set
and less deviations than requested were returned.
2018-07-26 19:43:46 +02:00
Mike Fährmann
bb89a1e6d7 [mangahere] use http://
invalid SSL cert for quite some time now
2018-07-26 18:11:31 +02:00
Mike Fährmann
212130b048 [deviantart] improve public-private token switching
- rename option to `prefer-public`
- now also works for galleries with less than 24 items
2018-07-25 12:52:36 +02:00
Mike Fährmann
886d662582 [deviantart] add option to minimize refresh-token usage
Always trying with a public token first and repeating the API request
with a private token if deviations are missing doesn't quite work for
galleries and folders with less than 25 items, so its an option and
not the default.
2018-07-24 21:44:57 +02:00
Mike Fährmann
d98e47817d [deviantart] reduce refresh-token usage
Instead of using a refresh-token-based access-token for every API
request, they are now only used for paginated results.

API requests to get a user's profile and the original download URL
now always use a public access-token.
2018-07-24 17:32:46 +02:00
Mike Fährmann
84854fcad7 [myportfolio] add user and gallery extractors (#95) 2018-07-19 18:56:45 +02:00
Mike Fährmann
c9f70e0a19 [paheal] use HTTPS 2018-07-17 21:25:03 +02:00
Mike Fährmann
ff436692bf ["deviantart] add 'journals' option 2018-07-16 18:14:41 +02:00
Mike Fährmann
00032b828c [deviantart] add 'wait-min' option 2018-07-14 11:52:21 +02:00
Mike Fährmann
a6fe2bb594 [whatisthisimnotgoodwithcomputers] remove extractor 2018-07-14 09:53:16 +02:00
Mike Fährmann
0ba93650e0 [8chan] replace unit test URL
the other thread is no longer accessible
2018-07-14 09:53:16 +02:00
Mike Fährmann
269dc2bbd5 [sankaku] add 'tags' option (#94) 2018-07-14 09:53:01 +02:00
Mike Fährmann
173add6935 [nijie] fix artist_id extraction
view_popup.php pages for older images or dojins either have the
artist_id value at a different place or not at all.
2018-07-10 12:30:53 +02:00
Mike Fährmann
6996f5c118 [mangahere] fix and improve chapter extraction 2018-07-09 20:07:40 +02:00
Mike Fährmann
1d43cbbf52 [gelbooru] tag-splitting for non-api mode 2018-07-06 15:24:19 +02:00
Mike Fährmann
2eefaa99a3 [mangapark] support .net and .com mirrors 2018-07-05 14:45:05 +02:00
Mike Fährmann
c20c0a4820 [safebooru] add pool extractor 2018-07-04 12:24:57 +02:00
Mike Fährmann
f916279ae6 [rule34] add pool extractor 2018-07-04 12:24:01 +02:00
Mike Fährmann
3dbc7c5f8d [gelbooru] restore pool functionality 2018-07-04 12:21:41 +02:00
Mike Fährmann
a2c74bc6f0 [gelbooru] inherit from BooruExtractor class
Breaks pool functionality when using API calls (for now),
but reduces code clutter and enables the `tags` option.
2018-07-04 12:21:41 +02:00
Mike Fährmann
4a57509392 generalize tag-splitting option (#92)
- extend functionality to other booru sites:
  - http://behoimi.org/
  - https://konachan.com/
  - https://e621.net/
  - https://rule34.xxx/
  - https://safebooru.org/
  - https://yande.re/
2018-07-04 12:21:16 +02:00
Mike Fährmann
188e956c4e [imagefap] use HTTPS + update test results 2018-06-30 19:40:46 +02:00
Mike Fährmann
87853538b4 [yandere] add option to split tags by type (#92) 2018-06-29 19:38:53 +02:00
Mike Fährmann
a699787d01 [deviantart] update URL patterns to new format
DeviantArt changed its URL format from
https://<name>.deviantart.com/...
to
https://www.deviantart.com/<name>/...

With this change both formats will be supported.
2018-06-28 20:21:59 +02:00
Mike Fährmann
9e3415886c [senmanga] fix/update tests 2018-06-27 20:05:22 +02:00
Mike Fährmann
b8c97d2295 use 'extractor.request()' for more HTTP requests 2018-06-25 23:40:59 +02:00
Mike Fährmann
150a6b9064 [xvideos] fix metadata extraction 2018-06-22 16:32:04 +02:00
Mike Fährmann
7a98cc9798 [smugmug] update tests
My test account expired and all uploaded images got deleted.
2018-06-22 15:04:31 +02:00