%3Aformat(webp)%2Fi.ytimg.com%252Fvi_webp%252F8jtAzxUwDj0%252Fmaxresdefault.webp&w=1139&q=75)
Description:
In this video, I break down Proximal Policy Optimization (PPO) from first principles, without assuming prior knowledge of Reinforcement Learning. By the end, you’ll understand the core RL building blocks that led to PPO, including: 🔵 Policy Gradient 🔵 Actor-Critic Models 🔵 The Value Function 🔵 The Generalized Advantage Estimate In the LLM world, PPO was used to train reasoning models like OpenAI's o1/o3, and presumably Claude 3.7, Grok 3, etc. It’s the backbone of Reinforcement Learning with Human Feedback (RLHF) -- which helps align AI models with human preferences and Reinforcement Learning with Verifiable Rewards (RLVR), which gives LLMs reasoning abilities. Papers: - PPO paper: https://arxiv.org/pdf/1707.06347 - GAE paper: https://arxiv.org/pdf/1506.02438 - TRPO paper: https://arxiv.org/pdf/1502.05477 Well-written blogposts: - https://danieltakeshi.github.io/2017/04/02/notes-on-the-generalized-advantage-estimation-paper/ - https://huggingface.co/blog/NormalUhr/rlhf-pipeline - https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ Implementations: - (Original) OpenAI Baseslines: https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/ppo2 - Hugging Face: https://github.com/huggingface/trl/blob/main/trl/trainer/ppo_trainer.py - Hugging Face docs: https://huggingface.co/docs/trl/main/en/ppo_trainer Mother of all RL books (Barto & Sutton): http://incompleteideas.net/book/RLbook2020.pdf 00:00 Intro 01:21 RL for LLMs 05:53 Policy Gradient 09:23 The Value Function 12:14 Generalized Advantage Estimate 17:17 End-to-end Training Algorithm 18:23 Importance Sampling 20:02 PPO Clipping 21:36 Outro Special thanks to Anish Tondwalkar for discussing some of these concepts with me. Note: At 21:10, A_t should have been inside the min. Thanks @t.w.7065 for catching this.
Mediafile available in formats
http://univideos.ru/ website is the best way to download a video or a separate audio track if you want to do without installing programs and extensions.
The UDL Helper extension is a convenient button that is seamlessly integrated into YouTube, Instagram and OK.ru sites for fast content download.
UDL Client program (for Windows) is the most powerful solution that supports more than 900 websites, social networks and video hosting sites, as well as any video quality that is available in the source.
UDL Lite is a really convenient way to access a website from your mobile device. With its help, you can easily download videos directly to your smartphone.
The best quality formats are FullHD (1080p), 2K (1440p), 4K (2160p) and 8K (4320p). The higher the resolution of your screen, the higher the video quality should be. However, there are other factors to consider: download speed, amount of free space, and device performance during playback.
The browser/computer should not freeze completely! If this happens, please report it with a link to the video. Sometimes videos cannot be downloaded directly in a suitable format, so we have added the ability to convert the file to the desired format. In some cases, this process may actively use computer resources.
You can download a video to your smartphone using the website or the PWA application UDL Lite. It is also possible to send a download link via QR code using the UDL Helper extension.
The most convenient way is to use the UDL Client program, which supports converting video to MP3 format. In some cases, MP3 can also be downloaded through the UDL Helper extension.
This feature is available in the UDL Helper extension. Make sure that "Show the video snapshot button" is checked in the settings. A camera icon should appear in the lower right corner of the player to the left of the "Settings" icon. When you click on it, the current frame from the video will be saved to your computer in JPEG format.
For this purpose you need VLC-player, which can be downloaded for free from the official website https://www.videolan.org/vlc/.
How to play streaming video through VLC player:
To download streaming video via VLC player, you need to convert it:
Warning: this download method no longer works with most YouTube videos.
It costs nothing. Our services are absolutely free for all users. There are no PRO subscriptions, no restrictions on the number or maximum length of downloaded videos.