Mike Fährmann
c6c06c41f6
[deviantart] don't add journal text to description ( #712 )
2020-06-05 21:56:12 +02:00
Mike Fährmann
4aea5138dd
[sensescans] use https://
2020-06-05 21:55:19 +02:00
Mike Fährmann
3eed5f52d7
[twitter] small metadata cleanup
...
- add 'date' field
- remove 'entities' and 'extended_entities'
- don't include 'focus_fields' from 'original_info'
2020-06-04 18:21:54 +02:00
Mike Fährmann
655c98cbef
[twitter] skip unavailable tweets
2020-06-04 14:51:25 +02:00
Mike Fährmann
41d03160ff
[deviantart] also search journals for sta.sh links ( #712 )
...
when 'extra' is enabled
2020-06-04 14:47:08 +02:00
Mike Fährmann
2132e5461a
[twitter] restore TwitPic support
2020-06-04 01:22:34 +02:00
Mike Fährmann
bd0f21478a
[twitter] login using the mobile nojs login page
2020-06-04 00:07:12 +02:00
Mike Fährmann
a10f31dde5
[twitter] rewrite; use new interface ( #740 , #806 )
...
Everything except logging in with username & password and TwitPic
embeds should be working again.
Metadata per Tweet is massively different than before (mostly raw API
responses - might need some cleaning up) and the default 'archive_fmt'
changed.
2020-06-03 20:51:29 +02:00
Mike Fährmann
3bad1579ee
update extractor test results
2020-05-31 17:42:07 +02:00
Mike Fährmann
864f4220d9
update output of 'oauth:…' ( #616 )
2020-05-31 17:41:40 +02:00
Mike Fährmann
0f459f340b
[instagram] fix and re-enable login with username&password
...
This reverts commit 3e0848a482 .
(#756 , #771 , #797 , #803 )
https://github.com/althonos/InsaLooter/issues/287#issuecomment-630456522
2020-05-31 00:29:09 +02:00
Mike Fährmann
3e0848a482
[instagram] disable login with username&password ( #756 )
2020-05-29 23:29:40 +02:00
Mike Fährmann
a32aea41e1
[instagram] update 'query_hash' values
2020-05-29 23:11:42 +02:00
Mike Fährmann
2bff8dd465
[hentainexus] fix flake8 issues ( #787 )
2020-05-28 22:45:08 +02:00
Mike Fährmann
a63682a9c0
[instagram] simplify code & complete tests ( #743 )
2020-05-28 22:31:01 +02:00
墨焓
a4e3d40672
hentainexus.py minor fix ( #787 )
...
* rectify code of `join_title`, some minor fix.
* + hentainexus self.data
* fixed: call staticmethod join_title with data
2020-05-28 21:59:26 +02:00
Vrihub
62b65e59d0
Add instagram metadata: post_pageurl, post_tags ( #743 )
...
* Add instagram metadata: post_pageurl, post_tags
Add the following metadata for instagram:
- post_pageurl: json string with url of the post page
- post_tags: json array with instagram tags extracted from the post description
* Oops: rename post_tags to tags for --write-tags
This way, --write-tags will pick up the post tags.
* Rename to post_url, improve regex
* Add post_url and tags to tests
* Remove duplicate tags and sort them
* Bugfix: don't create empty tag lists
* Metadata: add location
* Metadata: add tagged_users for each media
* Move self._find_tags() to base class
* Make flake happy
2020-05-28 21:58:24 +02:00
Mike Fährmann
275cceeb6a
[redgifs] fix extraction ( #724 )
...
… and prepare for more potential extractors
2020-05-28 02:18:42 +02:00
Mike Fährmann
45baa13615
update extractor test results
...
- don't run Instagram tests on Travis anymore
- replace Twitter test because timeline was made private
- update Hiperdex domain to '.com' (again ...)
2020-05-28 02:18:06 +02:00
Mike Fährmann
dfcf2a2c91
write OAuth token to cache by default ( #616 )
2020-05-25 22:35:45 +02:00
Mike Fährmann
15c3d29062
move dump_response() into a separate function ( #737 )
2020-05-25 22:21:58 +02:00
Mike Fährmann
a363da4b43
include redirects and headers in --write-pages dumps ( #737 )
2020-05-25 22:21:57 +02:00
Mike Fährmann
6bcdb264e0
[imgur] treat 't/unmuted' URLs as galleries
2020-05-25 22:21:57 +02:00
Mike Fährmann
b6cee3e45b
[imgur] fix extraction of animated images without 'mp4' entry
2020-05-25 22:21:57 +02:00
Leonardo Taccari
bcac31b7c7
[webtoons] make archive_fmt unique ( #779 )
...
close #778
2020-05-25 21:23:54 +02:00
Mike Fährmann
e19f665a44
[danbooru] change default for 'ugoira' to 'false'
...
Downloading the pre-rendered versions should be a better default
than .zip files with individual frames.
2020-05-20 19:57:28 +02:00
Mike Fährmann
3201fe3521
add global SENTINEL object
2020-05-19 22:32:53 +02:00
Mike Fährmann
c8787647ed
add global WINDOWS bool
2020-05-19 22:32:53 +02:00
Mike Fährmann
6294e2c540
add 'text.ensure_http_scheme()'
2020-05-19 22:32:53 +02:00
Mike Fährmann
0378d079a5
[webtoons] fixes and simplifications ( #593 , #761 )
...
- fix episode listings for french comics
- allow input URLs without explicit scheme
- add 'lang'/'language' metadata
- use str.format() instead of '+' to assemble URLs
2020-05-18 20:20:03 +02:00
Mike Fährmann
ab11b1c896
[imagechest] simplify code ( #750 )
2020-05-18 19:11:26 +02:00
Mike Fährmann
846d3a2466
[sexcom] replace 404ed test
2020-05-18 19:04:51 +02:00
Mike Fährmann
9b4635917f
[gelbooru] simplify and fix pool extraction
...
use 'pool:<pool id>' as search tag to get pool posts
2020-05-18 19:04:51 +02:00
Leonardo Taccari
39cd389679
[webtoons] Add a new extractor for webtoons.com ( #761 )
...
The webtoons extractor can extract episode and entire comic (all
episodes) from webtoons.com.
All the logic of the extractors should be trivial except for a couple
of kludges needed:
- `ageGatePass' cookie is always set to avoid possible redirect and stop of
extraction, especially in the comic extractor
- The image URLs returned by the episode extractor could not be fetched
directly and the `Referer:' HTTP header needs to be passed to fetch them
Close #593 .
2020-05-18 19:04:20 +02:00
Bepis
7b5711ee04
[imagechest] Add new extractor for ImageChest ( #750 )
...
* [imagechest] Add new extractor for ImageChest
* [imagechest] Fix flake8 compliance issues
2020-05-18 19:02:56 +02:00
Mike Fährmann
a1e739b96c
reuse connection adapters from parent extractors
2020-05-12 23:52:01 +02:00
Mike Fährmann
f8f95e68a7
improve '--write-pages' ( #737 )
...
- move code into its own function
- add enumeration index to filenames
- dump responses regardless of status code
2020-05-12 20:40:25 +02:00
Mike Fährmann
09cc9dbec0
prevent flake8 errors from comments looking like type annotations
2020-05-12 20:08:05 +02:00
Mike Fährmann
2d6724180b
[hiperdex] update domain to hiperdex.info
2020-05-12 17:00:51 +02:00
Vrihub
4cc761c730
Implement --write-pages option ( #736 )
...
* Implement --write-pages option
* Fix long lines
* Fix file mode to binary
* Fix pattern for Windows compatibility
2020-05-12 14:25:21 +02:00
Mike Fährmann
f557cac074
[redgifs] add image extractor ( #724 )
2020-05-10 00:31:42 +02:00
Mike Fährmann
65b1cb7acd
[deviantart] use private access tokens for Journals ( fixes #738 )
2020-05-08 21:45:01 +02:00
Mike Fährmann
0bf0146bfe
[reddit] don't send OAuth headers for file downloads ( fixes #729 )
2020-05-08 21:42:52 +02:00
Mike Fährmann
d6a480682f
update test results
2020-05-02 21:13:00 +02:00
Leonardo Taccari
b47cfc5ac9
[speakerdeck] Add a new extractor for speakerdeck.com ( #726 )
2020-05-01 22:32:22 +02:00
Mike Fährmann
90491ab606
[artstation] improve embed extraction ( #720 )
2020-04-30 21:25:03 +02:00
Mike Fährmann
999efec5cc
[deviantart] limit API wait times to 2**9=512 seconds ( #721 )
2020-04-30 21:16:09 +02:00
Mike Fährmann
504de79d8b
[vsco] fix extraction
2020-04-30 21:12:06 +02:00
Mike Fährmann
5e2974d699
[weibo] add 'videos' option
2020-04-30 00:00:30 +02:00
Mike Fährmann
9f638c2e01
[twitter] add 'replies' option ( closes #705 )
2020-04-29 23:20:06 +02:00