[blogger] Fix lh*.googleusercontent.com forward slash bug, add support for lh*-**.googleusercontent.com
Some URLs use "lh(number)-(locale).googleusercontent.com" format, so I added support for those. Also, "lh(number).googleusercontent.com" formats were broken because the regex was looking for a second forward slash. Examples: lh7.googleusercontent.com lh7-us.googleusercontent.com
This commit is contained in:
@@ -37,7 +37,8 @@ class BloggerExtractor(BaseExtractor):
|
|||||||
findall_image = re.compile(
|
findall_image = re.compile(
|
||||||
r'src="(https?://(?:'
|
r'src="(https?://(?:'
|
||||||
r'blogger\.googleusercontent\.com/img|'
|
r'blogger\.googleusercontent\.com/img|'
|
||||||
r'lh\d+\.googleusercontent\.com/|'
|
r'lh\d+\.googleusercontent\.com|'
|
||||||
|
r'lh\d+-\w+\.googleusercontent\.com|'
|
||||||
r'\d+\.bp\.blogspot\.com)/[^"]+)').findall
|
r'\d+\.bp\.blogspot\.com)/[^"]+)').findall
|
||||||
findall_video = re.compile(
|
findall_video = re.compile(
|
||||||
r'src="(https?://www\.blogger\.com/video\.g\?token=[^"]+)').findall
|
r'src="(https?://www\.blogger\.com/video\.g\?token=[^"]+)').findall
|
||||||
|
|||||||
Reference in New Issue
Block a user