Download "Proximal Policy Optimization (PPO) for LLMs Explained Intuitively"

Download this video with UDL Client
  • Video mp4 HD+ with sound
  • Mp3 in the best quality
  • Any size files
Video tags
|

Video tags

proximal policy optimization
ppo
reinforcement learning
reasoning models
LLM
machine learning
artificial intelligence
You already have UDL Helper installed You can download video in 1 click!
Installed
for
Google Chrome

Description:

In this video, I break down Proximal Policy Optimization (PPO) from first principles, without assuming prior knowledge of Reinforcement Learning. By the end, you’ll understand the core RL building blocks that led to PPO, including: 🔵 Policy Gradient 🔵 Actor-Critic Models 🔵 The Value Function 🔵 The Generalized Advantage Estimate In the LLM world, PPO was used to train reasoning models like OpenAI's o1/o3, and presumably Claude 3.7, Grok 3, etc. It’s the backbone of Reinforcement Learning with Human Feedback (RLHF) -- which helps align AI models with human preferences and Reinforcement Learning with Verifiable Rewards (RLVR), which gives LLMs reasoning abilities. Papers: - PPO paper: https://arxiv.org/pdf/1707.06347 - GAE paper: https://arxiv.org/pdf/1506.02438 - TRPO paper: https://arxiv.org/pdf/1502.05477 Well-written blogposts: - https://danieltakeshi.github.io/2017/04/02/notes-on-the-generalized-advantage-estimation-paper/ - https://huggingface.co/blog/NormalUhr/rlhf-pipeline - https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ Implementations: - (Original) OpenAI Baseslines: https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/ppo2 - Hugging Face: https://github.com/huggingface/trl/blob/main/trl/trainer/ppo_trainer.py - Hugging Face docs: https://huggingface.co/docs/trl/main/en/ppo_trainer Mother of all RL books (Barto & Sutton): http://incompleteideas.net/book/RLbook2020.pdf 00:00 Intro 01:21 RL for LLMs 05:53 Policy Gradient 09:23 The Value Function 12:14 Generalized Advantage Estimate 17:17 End-to-end Training Algorithm 18:23 Importance Sampling 20:02 PPO Clipping 21:36 Outro Special thanks to Anish Tondwalkar for discussing some of these concepts with me. Note: At 21:10, A_t should have been inside the min. Thanks @t.w.7065 for catching this.

Mediafile available in formats

popular icon
Popular
hd icon
HD video
audio icon
Only sound
total icon
All
* — If the video is playing in a new tab, go to it, then right-click on the video and select "Save video as..."
** — Link intended for online playback in specialized players

Questions about downloading video

question iconHow can I download "Proximal Policy Optimization (PPO) for LLMs Explained Intuitively" video?arrow icon

    http://univideos.ru/ website is the best way to download a video or a separate audio track if you want to do without installing programs and extensions.

    The UDL Helper extension is a convenient button that is seamlessly integrated into YouTube, Instagram and OK.ru sites for fast content download.

    UDL Client program (for Windows) is the most powerful solution that supports more than 900 websites, social networks and video hosting sites, as well as any video quality that is available in the source.

    UDL Lite is a really convenient way to access a website from your mobile device. With its help, you can easily download videos directly to your smartphone.

question iconWhich format of "Proximal Policy Optimization (PPO) for LLMs Explained Intuitively" video should I choose?arrow icon

    The best quality formats are FullHD (1080p), 2K (1440p), 4K (2160p) and 8K (4320p). The higher the resolution of your screen, the higher the video quality should be. However, there are other factors to consider: download speed, amount of free space, and device performance during playback.

question iconWhy does my computer freeze when loading a "Proximal Policy Optimization (PPO) for LLMs Explained Intuitively" video?arrow icon

    The browser/computer should not freeze completely! If this happens, please report it with a link to the video. Sometimes videos cannot be downloaded directly in a suitable format, so we have added the ability to convert the file to the desired format. In some cases, this process may actively use computer resources.

question iconHow can I download "Proximal Policy Optimization (PPO) for LLMs Explained Intuitively" video to my phone?arrow icon

    You can download a video to your smartphone using the website or the PWA application UDL Lite. It is also possible to send a download link via QR code using the UDL Helper extension.

question iconHow can I download an audio track (music) to MP3 "Proximal Policy Optimization (PPO) for LLMs Explained Intuitively"?arrow icon

    The most convenient way is to use the UDL Client program, which supports converting video to MP3 format. In some cases, MP3 can also be downloaded through the UDL Helper extension.

question iconHow can I save a frame from a video "Proximal Policy Optimization (PPO) for LLMs Explained Intuitively"?arrow icon

    This feature is available in the UDL Helper extension. Make sure that "Show the video snapshot button" is checked in the settings. A camera icon should appear in the lower right corner of the player to the left of the "Settings" icon. When you click on it, the current frame from the video will be saved to your computer in JPEG format.

question iconHow do I play and download streaming video?arrow icon

    For this purpose you need VLC-player, which can be downloaded for free from the official website https://www.videolan.org/vlc/.

    How to play streaming video through VLC player:

    • in video formats, hover your mouse over "Streaming Video**";
    • right-click on "Copy link";
    • open VLC-player;
    • select Media - Open Network Stream - Network in the menu;
    • paste the copied link into the input field;
    • click "Play".

    To download streaming video via VLC player, you need to convert it:

    • copy the video address (URL);
    • select "Open Network Stream" in the "Media" item of VLC player and paste the link to the video into the input field;
    • click on the arrow on the "Play" button and select "Convert" in the list;
    • select "Video - H.264 + MP3 (MP4)" in the "Profile" line;
    • click the "Browse" button to select a folder to save the converted video and click the "Start" button;
    • conversion speed depends on the resolution and duration of the video.

    Warning: this download method no longer works with most YouTube videos.

question iconWhat's the price of all this stuff?arrow icon

    It costs nothing. Our services are absolutely free for all users. There are no PRO subscriptions, no restrictions on the number or maximum length of downloaded videos.