[blogger] Fix lh*.googleusercontent.com forward slash bug, add support for lh*-**.googleusercontent.com
Some URLs use "lh(number)-(locale).googleusercontent.com" format, so I added support for those. Also, "lh(number).googleusercontent.com" formats were broken because the regex was looking for a second forward slash. Examples: lh7.googleusercontent.com lh7-us.googleusercontent.com
This commit is contained in:
@@ -37,7 +37,8 @@ class BloggerExtractor(BaseExtractor):
|
||||
findall_image = re.compile(
|
||||
r'src="(https?://(?:'
|
||||
r'blogger\.googleusercontent\.com/img|'
|
||||
r'lh\d+\.googleusercontent\.com/|'
|
||||
r'lh\d+\.googleusercontent\.com|'
|
||||
r'lh\d+-\w+\.googleusercontent\.com|'
|
||||
r'\d+\.bp\.blogspot\.com)/[^"]+)').findall
|
||||
findall_video = re.compile(
|
||||
r'src="(https?://www\.blogger\.com/video\.g\?token=[^"]+)').findall
|
||||
|
||||
Reference in New Issue
Block a user