Mike Fährmann
036a40943a
[twitter] don't cache results of 'user_by_screen_name()'
...
A 'keyarg=1' argument to the memcache decorator would have worked as
well, but keeping the user object in memory isn't useful for the vast
majority of use cases and only wastes space.
(closes #817 )
2020-06-10 20:58:42 +02:00
Mike Fährmann
4442dfe7b8
[twitter] add 'reply_to' metadata to replies
2020-06-09 21:48:04 +02:00
Mike Fährmann
83b7bd0413
[nhentai] fix extraction ( closes #819 )
2020-06-09 21:27:07 +02:00
Mike Fährmann
d769bb4b80
[twitter] improve pagination
2020-06-07 15:23:45 +02:00
Mike Fährmann
5bc1097f9d
[twitter] metadata cleanup #2
...
- remove useless clutter by creating new tweet-data dicts instead of
reusing the original Tweet objects
- rename fields to how they were named before
('id_str' -> 'tweet_id', etc.)
- only include 'author' if it would differ from 'user'
- restore 'archive_fmt'
2020-06-07 02:25:29 +02:00
Mike Fährmann
c6c06c41f6
[deviantart] don't add journal text to description ( #712 )
2020-06-05 21:56:12 +02:00
Mike Fährmann
4aea5138dd
[sensescans] use https://
2020-06-05 21:55:19 +02:00
Mike Fährmann
3eed5f52d7
[twitter] small metadata cleanup
...
- add 'date' field
- remove 'entities' and 'extended_entities'
- don't include 'focus_fields' from 'original_info'
2020-06-04 18:21:54 +02:00
Mike Fährmann
655c98cbef
[twitter] skip unavailable tweets
2020-06-04 14:51:25 +02:00
Mike Fährmann
41d03160ff
[deviantart] also search journals for sta.sh links ( #712 )
...
when 'extra' is enabled
2020-06-04 14:47:08 +02:00
Mike Fährmann
2132e5461a
[twitter] restore TwitPic support
2020-06-04 01:22:34 +02:00
Mike Fährmann
bd0f21478a
[twitter] login using the mobile nojs login page
2020-06-04 00:07:12 +02:00
Mike Fährmann
a10f31dde5
[twitter] rewrite; use new interface ( #740 , #806 )
...
Everything except logging in with username & password and TwitPic
embeds should be working again.
Metadata per Tweet is massively different than before (mostly raw API
responses - might need some cleaning up) and the default 'archive_fmt'
changed.
2020-06-03 20:51:29 +02:00
Mike Fährmann
3bad1579ee
update extractor test results
2020-05-31 17:42:07 +02:00
Mike Fährmann
864f4220d9
update output of 'oauth:…' ( #616 )
2020-05-31 17:41:40 +02:00
Mike Fährmann
0f459f340b
[instagram] fix and re-enable login with username&password
...
This reverts commit 3e0848a482 .
(#756 , #771 , #797 , #803 )
https://github.com/althonos/InsaLooter/issues/287#issuecomment-630456522
2020-05-31 00:29:09 +02:00
Mike Fährmann
3e0848a482
[instagram] disable login with username&password ( #756 )
2020-05-29 23:29:40 +02:00
Mike Fährmann
a32aea41e1
[instagram] update 'query_hash' values
2020-05-29 23:11:42 +02:00
Mike Fährmann
2bff8dd465
[hentainexus] fix flake8 issues ( #787 )
2020-05-28 22:45:08 +02:00
Mike Fährmann
a63682a9c0
[instagram] simplify code & complete tests ( #743 )
2020-05-28 22:31:01 +02:00
墨焓
a4e3d40672
hentainexus.py minor fix ( #787 )
...
* rectify code of `join_title`, some minor fix.
* + hentainexus self.data
* fixed: call staticmethod join_title with data
2020-05-28 21:59:26 +02:00
Vrihub
62b65e59d0
Add instagram metadata: post_pageurl, post_tags ( #743 )
...
* Add instagram metadata: post_pageurl, post_tags
Add the following metadata for instagram:
- post_pageurl: json string with url of the post page
- post_tags: json array with instagram tags extracted from the post description
* Oops: rename post_tags to tags for --write-tags
This way, --write-tags will pick up the post tags.
* Rename to post_url, improve regex
* Add post_url and tags to tests
* Remove duplicate tags and sort them
* Bugfix: don't create empty tag lists
* Metadata: add location
* Metadata: add tagged_users for each media
* Move self._find_tags() to base class
* Make flake happy
2020-05-28 21:58:24 +02:00
Mike Fährmann
275cceeb6a
[redgifs] fix extraction ( #724 )
...
… and prepare for more potential extractors
2020-05-28 02:18:42 +02:00
Mike Fährmann
45baa13615
update extractor test results
...
- don't run Instagram tests on Travis anymore
- replace Twitter test because timeline was made private
- update Hiperdex domain to '.com' (again ...)
2020-05-28 02:18:06 +02:00
Mike Fährmann
dfcf2a2c91
write OAuth token to cache by default ( #616 )
2020-05-25 22:35:45 +02:00
Mike Fährmann
15c3d29062
move dump_response() into a separate function ( #737 )
2020-05-25 22:21:58 +02:00
Mike Fährmann
a363da4b43
include redirects and headers in --write-pages dumps ( #737 )
2020-05-25 22:21:57 +02:00
Mike Fährmann
6bcdb264e0
[imgur] treat 't/unmuted' URLs as galleries
2020-05-25 22:21:57 +02:00
Mike Fährmann
b6cee3e45b
[imgur] fix extraction of animated images without 'mp4' entry
2020-05-25 22:21:57 +02:00
Leonardo Taccari
bcac31b7c7
[webtoons] make archive_fmt unique ( #779 )
...
close #778
2020-05-25 21:23:54 +02:00
Mike Fährmann
e19f665a44
[danbooru] change default for 'ugoira' to 'false'
...
Downloading the pre-rendered versions should be a better default
than .zip files with individual frames.
2020-05-20 19:57:28 +02:00
Mike Fährmann
3201fe3521
add global SENTINEL object
2020-05-19 22:32:53 +02:00
Mike Fährmann
c8787647ed
add global WINDOWS bool
2020-05-19 22:32:53 +02:00
Mike Fährmann
6294e2c540
add 'text.ensure_http_scheme()'
2020-05-19 22:32:53 +02:00
Mike Fährmann
0378d079a5
[webtoons] fixes and simplifications ( #593 , #761 )
...
- fix episode listings for french comics
- allow input URLs without explicit scheme
- add 'lang'/'language' metadata
- use str.format() instead of '+' to assemble URLs
2020-05-18 20:20:03 +02:00
Mike Fährmann
ab11b1c896
[imagechest] simplify code ( #750 )
2020-05-18 19:11:26 +02:00
Mike Fährmann
846d3a2466
[sexcom] replace 404ed test
2020-05-18 19:04:51 +02:00
Mike Fährmann
9b4635917f
[gelbooru] simplify and fix pool extraction
...
use 'pool:<pool id>' as search tag to get pool posts
2020-05-18 19:04:51 +02:00
Leonardo Taccari
39cd389679
[webtoons] Add a new extractor for webtoons.com ( #761 )
...
The webtoons extractor can extract episode and entire comic (all
episodes) from webtoons.com.
All the logic of the extractors should be trivial except for a couple
of kludges needed:
- `ageGatePass' cookie is always set to avoid possible redirect and stop of
extraction, especially in the comic extractor
- The image URLs returned by the episode extractor could not be fetched
directly and the `Referer:' HTTP header needs to be passed to fetch them
Close #593 .
2020-05-18 19:04:20 +02:00
Bepis
7b5711ee04
[imagechest] Add new extractor for ImageChest ( #750 )
...
* [imagechest] Add new extractor for ImageChest
* [imagechest] Fix flake8 compliance issues
2020-05-18 19:02:56 +02:00
Mike Fährmann
a1e739b96c
reuse connection adapters from parent extractors
2020-05-12 23:52:01 +02:00
Mike Fährmann
f8f95e68a7
improve '--write-pages' ( #737 )
...
- move code into its own function
- add enumeration index to filenames
- dump responses regardless of status code
2020-05-12 20:40:25 +02:00
Mike Fährmann
09cc9dbec0
prevent flake8 errors from comments looking like type annotations
2020-05-12 20:08:05 +02:00
Mike Fährmann
2d6724180b
[hiperdex] update domain to hiperdex.info
2020-05-12 17:00:51 +02:00
Vrihub
4cc761c730
Implement --write-pages option ( #736 )
...
* Implement --write-pages option
* Fix long lines
* Fix file mode to binary
* Fix pattern for Windows compatibility
2020-05-12 14:25:21 +02:00
Mike Fährmann
f557cac074
[redgifs] add image extractor ( #724 )
2020-05-10 00:31:42 +02:00
Mike Fährmann
65b1cb7acd
[deviantart] use private access tokens for Journals ( fixes #738 )
2020-05-08 21:45:01 +02:00
Mike Fährmann
0bf0146bfe
[reddit] don't send OAuth headers for file downloads ( fixes #729 )
2020-05-08 21:42:52 +02:00
Mike Fährmann
d6a480682f
update test results
2020-05-02 21:13:00 +02:00
Leonardo Taccari
b47cfc5ac9
[speakerdeck] Add a new extractor for speakerdeck.com ( #726 )
2020-05-01 22:32:22 +02:00