python+requests爬取B站视频保存到本地

python,requests · 浏览次数 : 0

小编点评

```python import osimport datetimefrom django.test import TestCase# Create your tests here. import requestsimport reimport jsonimport subprocessfrom concurrent.futures import ThreadPoolExecutor def download_video(url): # file_path = 'django3+drf+Vue3前后端分离/' headers = { "Cookie": "buvid3=C6ED37CC-DC0F-D1B0-BA82-231C0731E3C971724infoc; b_nut=1698137871; _uuid=xxxx-9398-E7CA-10B95-xxxx; buvid_fp=xxxx; buvid4=xxxx-93C3-xxxx-xxxx-0F1D34771D4274275-023102416-aL0NYo%xxxx%3D%3D; header_theme_version=CLOSE; DedeUserID=345707270; DedeUserID__ckMd5=7506c67cb7588c20; enable_web_push=ENABLE; iflogin_when_web_push=True;" } response = requests.get(url, headers=headers) return response.content def main(bvid, start, end): urls = [f"https://www.bilibili.com/video/{bvid}/?p={i}" for i in range(start, end + 1)] print(urls) with ThreadPoolExecutor(max_workers=10) as executor: executor.map(download_video, urls) if __name__ == '__main__': # 爬取不同的视频合集时,只需要修改bvid的值即可, bvid的值在视频地址中可以获取https://www.bilibili.com/video/BV1Rs4y1o7E8 bvid = 'BV1Sz4y1o7E8' start = 1 end = 56 main(bvid, start, end) # 打印爬取的总时间 result_time = end - start print(result_time) ``` **使用说明:** 1. 将您的bilibili视频ID替换为`bvid`。 2. 将`start`和`end`变量设置为要爬取的集数。 3. 将代码运行:`python your_script.py`。 **注意:** * 爬取速度可能因网络条件和bilibili服务器负载而有所不同。 * 确保您拥有合法的权限访问bilibili视频页面。

正文

import os
import datetime

from django.test import TestCase

# Create your tests here.

import requests
import re
import json
import subprocess
from concurrent.futures import ThreadPoolExecutor


def download_video(url):
    # file_path = 'django3+drf+Vue3前后端分离/'
    headers = {
        "Cookie": "buvid3=C6ED37CC-DC0F-D1B0-BA82-231C0731E3C971724infoc; b_nut=1698137871; _uuid=xxxx-9398-E7CA-10B95-xxxx; buvid_fp=xxxx; buvid4=xxxx-93C3-xxxx-xxxx-0F1D34771D4274275-023102416-aL0NYo%xxxx%3D%3D; header_theme_version=CLOSE; DedeUserID=345707270; DedeUserID__ckMd5=7506c67cb7588c20; enable_web_push=ENABLE; iflogin_when_web_push=1; CURRENT_FNVAL=4048; rpdid=|(kYRk|Ruuk)0J'uYm)~JRmml; home_feed_column=5; PVID=1; FEED_LIVE_VERSION=V8; browser_resolution=1920-908; SESSDATA=0aff21e1%2C1729848907%2Ca2f88%2A42CjDHEfsdfE5mZ9GMKVTmTqG3aIO7dew8YUpjK9-z7OXOdBOYjXPi4FVQgJEVacJ0UQkSVk4xTGRnLTEzOHF3TDktYlhEa2JDS3ZFV0FfYjlHZ3ctdzhlWlVDZmhpUFZsMEJCSTZtQkxUU1FiRC1IV1pMenVFV1JxcVhCc2sxNEtCemgyY1dtQVZBIIEC; bili_jct=768662980741f061aedc30f722129d8b; sid=7tqiav60; bp_t_offset_345707270=925256601212813351; b_lsid=DBC104B55_18F27B3DA65; share_source_origin=COPY; bsource=share_source_copy_link; hit-dyn-v2=1; bili_ticket=eyJhbGciOiJIUzI1NiIsImtpZCI6InMwMyIsInR5cCI6IkpXVCJ9.eyJleHAiOjE3MTQ2MTczMzMsImlhdCI6MTcxNDM1ODA3MywicGx0IjotMX0.qfWz2oLOuJvDWHCM6Cgwl0SEVjpN6LkOreX8ApoYD4k; bili_ticket_expires=1714617273",
        "Origin": "https://www.bilibili.com",
        "Referer": "https://www.bilibili.com/video/BV1ZR4y1U7Qz?p=2",
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
    }

    response = requests.get(url, headers=headers)
    html = response.text
    # print(response.text)
  # 获取视频集的名称作为存放视频的目录
file_path = re.findall(r'data-title="(.*?)" title', html)[0].replace(' ', '') print(file_path) # 检查目录是否存在 if not os.path.exists(file_path): # 如果目录不存在,创建目录 os.makedirs(file_path) print(f"Directory '{file_path}' created successfully.") else: print(f"Directory '{file_path}' already exists.") file_path = file_path + '/'
  # 获取每集视频的名称作为文件名
title = re.findall(r'<title data-vue-meta="true">(.*?)_哔哩哔哩_bilibili</title>', html)[0].replace(' ', '') print(title)
  # 获取视频信息部分 video_info
= re.findall(r'<script>window.__playinfo__=(.*?)</script>', html)[0] print(video_info) json_data = json.loads(video_info)
  # 从视频信息部分提取出视频和音频的地址(B站的视频和音频是分开的) video_url
= json_data['data']['dash']['video'][0]['baseUrl'] audio_url = json_data['data']['dash']['audio'][0]['baseUrl'] print(video_url) print(audio_url)
  # 获取视频和音频的内容并保存成avi和mp3文件 video_content
= requests.get(video_url, headers=headers).content audio_content = requests.get(audio_url, headers=headers).content with open(file_path + title + '.avi', 'wb') as video: video.write(video_content) with open(file_path + title + '.mp3', 'wb') as audio: audio.write(audio_content)   # 使用ffmpeg工具将视频和音频文件合并成一个文件 cmd = f"ffmpeg -i {file_path}{title}.avi -i {file_path}{title}.mp3 -c:v copy -c:a aac -strict experimental {file_path}{title}.mp4" subprocess.run(cmd, shell=True) os.remove(f'{file_path}{title}.avi') os.remove(f'{file_path}{title}.mp3')
# 多线程的方式爬取(速度要快很多)
def main(bvid, start, end): urls = [f'https://www.bilibili.com/video/{bvid}/?p={i}' for i in range(start, end + 1)] print(urls) with ThreadPoolExecutor(max_workers=10) as executor: executor.map(download_video, urls) if __name__ == '__main__':
# 爬取不同的视频合集时,只需要修改bvid的值即可, bvid的值在视频地址中可以获取https://www.bilibili.com/video/BV1Rs4y127j8/?spm_id_from=333.999.0.0&vd_source=6cdcd08f45ddc987f3f46f8ee8f80b9e bvid
= 'BV1Sz4y1o7E8' starttime = datetime.datetime.now() print(starttime)
  # start和end是开始和结束爬取的集数,如视频集有20集,start为1,end为20 start
= 1 end = 56 main(bvid, start, end) endtime = datetime.datetime.now() print(endtime) result_time = endtime - starttime print(result_time)

 

与python+requests爬取B站视频保存到本地相似的内容:

python+requests爬取B站视频保存到本地

import os import datetime from django.test import TestCase # Create your tests here. import requests import re import json import subprocess from conc

Python爬虫(二):写一个爬取壁纸网站图片的爬虫(图片下载,词频统计,思路)

好家伙,写爬虫 代码: import requests import re import os from collections import Counter import xlwt # 创建Excel文件 workbook = xlwt.Workbook(encoding='utf-8') wor

《探索Python Requests中的代理应用与实践》

本文详细介绍了如何在Python的requests库中使用高匿代理和隧道代理,以及如何部署一个简易的代理IP池来提高爬虫的稳定性和匿名性。同时,文章还深入探讨了野生代理的来源及其潜在的安全风险和使用限制。这篇文章适合希望进一步了解代理技术及其在网络爬虫开发中应用的读者。

python教程8-页面爬虫

python爬虫常用requests和beautifulSoup这2个第三方模块。需要先进行手动安装。 requests负责下载页面数据,beautifulSoup负责解析页面标签。 关于beautifulSoup的api使用,详见api页面:https://beautifulsoup.readth

深入Python网络编程:从基础到实践

**Python,作为一种被广泛使用的高级编程语言,拥有许多优势,其中之一就是它的网络编程能力。Python的强大网络库如socket, requests, urllib, asyncio,等等,让它在网络编程中表现优秀。本文将深入探讨Python在网络编程中的应用,包括了基础的socket编程,到

python并发执行request请求

本文详细介绍了Python并发执行Request请求的方法示例,给出了详细的代码示例,同时也介绍了Python中实现并发编程的方法。

【Azure 应用服务】使用Python Azure SDK 来获取 App Service的访问限制信息(Access Restrictions)

azure.core.exceptions.ClientAuthenticationError: Authentication failed: AADSTS70011: The provided request must include a 'scope' input parameter. The provided value for the input parameter 'scope' is

python flask 提供web的get/post开发

转载请注明出处: 使用python flask框架编写web api中的get与post接口,代码编写与调试示例如下: from flask import Flask, request, jsonify app = Flask(__name__) @app.route('/api/get', met

kettle从入门到精通 第七十一课 ETL之kettle 再谈http post,轻松掌握body中传递json参数

场景: kettle中http post步骤如何发送http请求且传递body参数? 解决方案: http post步骤中直接设置Request entity field字段即可。 1、手边没有现成的post接口,索性用python搭建一个简单的接口,关键代码如下(安装python环境略): fro

音频文件降噪及python示例

操作系统 :Windows 10_x64 Python版本:3.9.2 noisereduce版本:3.0.2 从事音频相关工作,大概率会碰到降噪问题,今天整理下之前学习音频文件降噪的笔记,并提供Audacity和python示例。 我将从以下几个方面展开: noisereduce库介绍 使用Aud