Download "Transcribe Text via SFSpeechRecognizer (Lesson 05)"

Download this video with UDL Client

Video mp4 HD+ with sound
Mp3 in the best quality
Any size files

"videoThumbnail Transcribe Text via SFSpeechRecognizer (Lesson 05)

Arcane Arts Academy | Official Trailer

Arcane Arts Academy | Official Trailer

Channel: GameHouse Original Stories

Block Craft 3D Android Gameplay (HD)

Block Craft 3D Android Gameplay (HD)

Arcane Arts Academy 2 | Official Trailer | Play Now

Arcane Arts Academy 2 | Official Trailer | Play Now

Channel: GameHouse Original Stories

Паттерны проектирования в iOS - Заключение

Паттерны проектирования в iOS - Заключение

DEVI FARE SUBITO QUESTO! il tuo TELEFONO TORNA COME NUOVO! (GRATIS in 3 MINUTI)

DEVI FARE SUBITO QUESTO! il tuo TELEFONO TORNA COME NUOVO! (GRATIS in 3 MINUTI)

Channel: Bighenet

POPULAR OVERLAYS! (cyber, Y2K, glitches, edgy, baddie, soft etc.)

POPULAR OVERLAYS! (cyber, Y2K, glitches, edgy, baddie, soft etc.)

Sony Xperia XZ2 Alarm #1

Sony Xperia XZ2 Alarm #1

ALL VPN providers HATE this trick!!!

ALL VPN providers HATE this trick!!!

Android App Development for Beginners - 3 - Setting up Your Project

Android App Development for Beginners - 3 - Setting up Your Project

Retro Review: Monsoon MM-700 Neodymium Speakers

Retro Review: Monsoon MM-700 Neodymium Speakers

Channel: Snazzy Labs

swift

swiftui

combine

xcode

ios

app

development

programming

code

apple

academy

tutorial

build

iphone

00:00:00

In today's lesson, we will be using SF

00:00:02

Speech Analyzer, a new API to allow

00:00:06

voice input for our local AI chat app.

00:00:10

This topic is a bit more complex to

00:00:13

implement as we won't really be living

00:00:14

in Swift UI land for this lesson. So,

00:00:18

I've already prepared all the code and I

00:00:19

will walk you through it during the

00:00:21

lesson. Parts of the code are also

00:00:23

linked in the video description for your

00:00:25

own reference as I won't go through

00:00:28

every single detail today. So basically

00:00:30

there are three things we need to do.

00:00:32

First of all, we of course need to

00:00:33

implement a transcription service and

00:00:36

I've called it speechtoext service in

00:00:38

this project. Then we need to add this

00:00:41

little dictation button down here. So

00:00:43

this is just an SF symbol button. And

00:00:46

then lastly, we need to pull everything

00:00:48

together and use the speech analyzer in

00:00:52

our existing business logic. So let's

00:00:55

get right started. In our UI, we already

00:00:58

have our safe area insert for the bottom

00:01:01

here with our text field and our send

00:01:04

button. So these two things down here.

00:01:07

And then we also added a second button

00:01:11

with a microphone. So while we are

00:01:14

recording and this just basically flips

00:01:16

a a boolean is recording, it shows one

00:01:18

SF symbol and then let me quickly turn

00:01:22

that on. So this is the SF symbol while

00:01:24

we are recording and then this is the SF

00:01:26

symbol when we stop recording. Of course

00:01:28

microphone stuff doesn't work in the

00:01:30

simulator. So you'll have to run that on

00:01:32

your real device. And before I forget

00:01:34

about it, we'll of course need to have

00:01:36

this privacy microphone usage

00:01:38

description string in our info.plist

00:01:40

file in order to get the hardware access

00:01:42

needed for this feature. So now that we

00:01:44

have our little microphone button in

00:01:46

there, we either call start recording or

00:01:48

stop recording depending on what the

00:01:50

current state is. And I've moved all of

00:01:53

the new variables and both of the

00:01:56

functions to the bottom of the file just

00:01:58

for our reference. But I would advise

00:02:00

you to keep all of the state variable

00:02:02

and also the speech to text service at

00:02:04

the top of the file. So you have a nice

00:02:06

structure. So basically we have our is

00:02:08

recording flag. Then we have a potential

00:02:11

error message to show. We also have a

00:02:14

transcription task and we need to store

00:02:16

this here but we'll get into that in

00:02:18

just a second. And we also have our

00:02:20

speechtoext service which we'll also go

00:02:23

through in a minute. So then we have our

00:02:25

start recording function and our stop

00:02:27

recording function. Let's start by

00:02:29

looking at the start recording function.

00:02:31

What does it actually do? Because we're

00:02:33

actually streaming audio here. So we

00:02:35

make sure that we're not recording

00:02:36

already. So this button can't be pressed

00:02:38

twice. This is already somehow checked

00:02:41

by the UI or by the button itself, but

00:02:44

just to make sure nothing bad happens

00:02:46

here. And then of course we set is

00:02:48

recording to true. We create our

00:02:50

transcription task and this is just a

00:02:52

swift concurrency task but we're storing

00:02:54

it here in the variable so it doesn't

00:02:56

get cancelled when this function is over

00:02:59

because we are streaming audio. So this

00:03:01

task uh lasts for quite a long time

00:03:04

potentially until we cancel it manually

00:03:06

in our stop recording function. So in

00:03:08

there we await authorization to use the

00:03:11

microphone and speech to text. Then we

00:03:15

create a stream and we'll look into what

00:03:17

this transcribe function does in a

00:03:19

minute. We have a four try await. So we

00:03:22

are streaming in the partial results

00:03:25

from the microphone and we're assigning

00:03:28

self.input to this new value and

00:03:30

self.input is just the variable that our

00:03:33

text field writes to and the variable of

00:03:36

the message that we send to our large

00:03:39

language model. So basically we take the

00:03:41

partial result and we assign it to our

00:03:43

input. That way it also automatically

00:03:46

gets displayed in the text field and if

00:03:49

there's any error we will also assign it

00:03:51

to our error message and you can show

00:03:53

that on your screen whichever way you'd

00:03:55

like. Since we're already here let's

00:03:56

also briefly look into the stop

00:03:58

recording function before we go into how

00:04:00

the transcription using as a speech

00:04:02

analyzer actually works. So we make sure

00:04:05

that we are actually recording then we

00:04:07

set it to false. We cancel and nil out

00:04:10

our transcription task. So making sure

00:04:13

that no new audio gets streamed into our

00:04:15

text field variable. And then we also

00:04:18

call stop transcribing on our speechto

00:04:20

text service to clean up everything over

00:04:22

there. So let's have a look at our

00:04:24

speechtoext service. And this file is

00:04:26

linked via a GitHub gist in the video

00:04:29

description. So you can check it out and

00:04:31

download it over there or reference it

00:04:32

for your own implementation. At the top

00:04:35

here we have two rappers for some old um

00:04:38

completion handler based APIs for SF

00:04:40

speech recognizer and AV audio session.

00:04:43

I won't go into detail. You can look

00:04:44

these up in the GS just these are just

00:04:46

concurrency wrappers to make our API a

00:04:49

bit nicer to use. As I said, I won't go

00:04:52

through this line by line because it is

00:04:54

quite complex, but I will try to tell

00:04:56

you the logic behind how it works. And

00:04:58

then if you want to go into it yourself,

00:05:00

you're free to check out the file or the

00:05:02

documentation that I linked here on top

00:05:04

of the file. So, as you can see, there

00:05:06

is quite a bit of code in here. So, we

00:05:08

have an authorize function. We have a

00:05:11

transcribe function. This is probably

00:05:13

the most uh interesting one. We have a

00:05:16

stop transcribing function, reset, and

00:05:19

prepare. engine. So let's look into

00:05:22

transcribe because this is the only

00:05:24

function that actually gets called in

00:05:26

our UI to do the transcription. So if

00:05:28

you remember from our UI implementation

00:05:32

here we have a stream that we get from

00:05:34

the speechto text service transcribe

00:05:36

function. So it has to return an async

00:05:39

throwing stream. So we can use the for

00:05:42

tryw weight syntax here. So this is how

00:05:45

you create one of these and then you

00:05:46

have a continuation and you can yield

00:05:49

values into the continuation to have a

00:05:51

new entry in the stream. This is very

00:05:54

similar to how combine was used back in

00:05:57

the day. In there we create a task

00:05:59

because this is an async throwing

00:06:01

stream. We have some safety checks that

00:06:04

our speech recognizer is already created

00:06:07

and that it is available on the device.

00:06:11

And then this is basically the

00:06:13

interesting part. And as you can see,

00:06:14

it's actually not that much code.

00:06:15

There's just quite a bit of overhead and

00:06:17

some error handling in here. The

00:06:20

interesting thing is to do the speech

00:06:22

transcription, we need a speech engine

00:06:25

and a speech transcription request. And

00:06:29

we create these in the prepare engine

00:06:31

function just to keep this code a bit

00:06:33

more organized. So let's have a look at

00:06:35

the prepare engine function. And you can

00:06:36

already see this returns an AV audio

00:06:39

engine and then SF speech audio buffer

00:06:42

recognition request. So quite a

00:06:44

mouthful. And then this is basically

00:06:46

just a bunch of setup code for an audio

00:06:48

session. Making sure to set the category

00:06:51

to record with a measurement mode and

00:06:54

duck others options to make sure that

00:06:56

the audio is clean and usable for our

00:06:59

use case here. Then we create our audio

00:07:01

engine. We do a bit more setup here, but

00:07:04

that's not too interesting, I believe.

00:07:06

Second thing that we return aside from

00:07:08

our engine is our recognition request or

00:07:11

speech recognition request. So, we

00:07:14

create that over here.

00:07:17

It's important that we set should report

00:07:18

partial results to true because that's

00:07:20

what we want to do. We want to stream in

00:07:23

the um audio. We also want to add

00:07:25

punctuation because they the user might

00:07:28

say multiple sentences and use periods

00:07:31

or question marks for example. And we're

00:07:33

also going to tell the audio recognition

00:07:36

request that this is in fact a

00:07:37

dictation. This just helps it internally

00:07:39

to be a bit more accurate. If we want to

00:07:41

we can set requires ondevice recognition

00:07:43

to true to only use the ondevice models.

00:07:46

But in our use case it doesn't really

00:07:48

matter. All right. Then we do some more

00:07:50

setup. We prepare our engine. then we

00:07:52

start it and we return our engine and

00:07:55

our SF speech audio buffer recognition

00:07:57

request. So once we have called prepare

00:07:59

engine here, we are now locally uh

00:08:02

storing our audio engine and our request

00:08:05

and then we're creating our real

00:08:07

recognition task here. And once again

00:08:10

there's a lot of uh state checking and

00:08:13

error handling up here but actually it's

00:08:16

pretty simple.

00:08:18

So with our recognition task we get a

00:08:20

result and a an optional error here.

00:08:25

So if we have an error of course we uh

00:08:28

show some error state and we reset the

00:08:30

speech to text service. Not really

00:08:32

important right now. You can of course

00:08:34

change the implementation for your own

00:08:36

app. What's interesting is that we now

00:08:39

use the continuation that I mentioned

00:08:41

beforehand from our async throwing

00:08:43

stream. So we can iterate over these

00:08:46

results and we yield the best

00:08:49

transcription

00:08:51

as a formatted string. This works

00:08:54

because we set should report partial

00:08:56

results to true. And just as a reminder,

00:08:58

we get this transcription here within

00:09:01

the closure of our recognition task.

00:09:03

This result object is actually of type

00:09:07

SF speech recognition result and it

00:09:09

doesn't only have a best transcription.

00:09:11

It also has an is final property. So if

00:09:15

the transcription is final, we will

00:09:17

finish our continuation. So the for loop

00:09:20

will then stop and we will also reset

00:09:23

our speechtoext service to be clean

00:09:26

again for the next dictation. Then of

00:09:28

course we also have a stop transcribing

00:09:30

function and a reset function. You can

00:09:32

look into these yourself if you're

00:09:34

interested. Once again, this file is

00:09:36

linked as a GitHub gist in the video

00:09:38

description. And there we have it. We

00:09:40

have successfully added dictation to our

00:09:42

local AI chat app only using ondevice

00:09:46

firstparty Apple framework. So this is

00:09:49

actually super easy. SF speech analyzer

00:09:52

also works pretty accurately as far as

00:09:55

my tests are concerned and the community

00:09:57

feedback that I've heard. This is a big

00:09:59

improvement over the old APIs that we've

00:10:02

had previously. Of course, SF Speech

00:10:04

Analyzer is only available in iOS 26 and

00:10:07

up, but that's totally fine for our use

00:10:09

case as we're using Foundation models,

00:10:11

which is also only available in iOS 26

00:10:13

and

Description:

This lesson walks through an example of integrating SFSpeechRecognizer from Apple's Speech framework into an iOS 26 local AI chat app. Example Code: https://gist.github.com/chFlorian/0bc373278b11cff5ea547d112a5b5ac6 Join this channel to get access to perks: https://www.youtube.com/channel/UCYt_AtiKPyda44NYzwABvQQ/join 🚀 LaunchBuddy: https://apple.co/3iFcjjW 📚 Try CWC+: https://bit.ly/cwc_flo 🔭 Astro for ASO: https://flowritesco.de/astro ☕️ Buy me a coffee: https://ko-fi.com/flowritescode 👋 Links: https://flowritesco.de 🛠 Forge: https://apple.co/3riG8MQ Affiliate Links ❤ 📕 SwiftUI & Combine Books: https://www.bigmountainstudio.com/a/tpgmp 🔬 Get Reports about your apps: https://appfigures.com/r/5by3g 📊 Privacy focused analytics: https://dashboard.telemetrydeck.com/registration/organization?referralCode=27AOWO4R1TTEJBST 💻 The most powerful mac app for developers: https://devutils.app/?ref=flo ☕️ Support me: https://ko-fi.com/flowritescode If you have any video suggestions please feel free to let me know by a comment. Get in contact via Twitter: https://twitter.com/FloWritesCode

Mediafile available in formats

Popular

HD video

Only sound

All

* — If the video is playing in a new tab, go to it, then right-click on the video and select "Save video as..."

** — Link intended for online playback in specialized players

Questions about downloading video

How can I download "Transcribe Text via SFSpeechRecognizer (Lesson 05)" video?

http://univideos.ru/ website is the best way to download a video or a separate audio track if you want to do without installing programs and extensions.

The UDL Helper extension is a convenient button that is seamlessly integrated into YouTube, Instagram and OK.ru sites for fast content download.

UDL Client program (for Windows) is the most powerful solution that supports more than 900 websites, social networks and video hosting sites, as well as any video quality that is available in the source.

UDL Lite is a really convenient way to access a website from your mobile device. With its help, you can easily download videos directly to your smartphone.

Which format of "Transcribe Text via SFSpeechRecognizer (Lesson 05)" video should I choose?

The best quality formats are FullHD (1080p), 2K (1440p), 4K (2160p) and 8K (4320p). The higher the resolution of your screen, the higher the video quality should be. However, there are other factors to consider: download speed, amount of free space, and device performance during playback.

Why does my computer freeze when loading a "Transcribe Text via SFSpeechRecognizer (Lesson 05)" video?

The browser/computer should not freeze completely! If this happens, please report it with a link to the video. Sometimes videos cannot be downloaded directly in a suitable format, so we have added the ability to convert the file to the desired format. In some cases, this process may actively use computer resources.

How can I download "Transcribe Text via SFSpeechRecognizer (Lesson 05)" video to my phone?

You can download a video to your smartphone using the website or the PWA application UDL Lite. It is also possible to send a download link via QR code using the UDL Helper extension.

How can I download an audio track (music) to MP3 "Transcribe Text via SFSpeechRecognizer (Lesson 05)"?

The most convenient way is to use the UDL Client program, which supports converting video to MP3 format. In some cases, MP3 can also be downloaded through the UDL Helper extension.

How can I save a frame from a video "Transcribe Text via SFSpeechRecognizer (Lesson 05)"?

This feature is available in the UDL Helper extension. Make sure that "Show the video snapshot button" is checked in the settings. A camera icon should appear in the lower right corner of the player to the left of the "Settings" icon. When you click on it, the current frame from the video will be saved to your computer in JPEG format.

How do I play and download streaming video?

For this purpose you need VLC-player, which can be downloaded for free from the official website https://www.videolan.org/vlc/.

How to play streaming video through VLC player:

in video formats, hover your mouse over "Streaming Video**";
right-click on "Copy link";
open VLC-player;
select Media - Open Network Stream - Network in the menu;
paste the copied link into the input field;
click "Play".

To download streaming video via VLC player, you need to convert it:

copy the video address (URL);
select "Open Network Stream" in the "Media" item of VLC player and paste the link to the video into the input field;
click on the arrow on the "Play" button and select "Convert" in the list;
select "Video - H.264 + MP3 (MP4)" in the "Profile" line;
click the "Browse" button to select a folder to save the converted video and click the "Start" button;
conversion speed depends on the resolution and duration of the video.

Warning: this download method no longer works with most YouTube videos.

What's the price of all this stuff?

It costs nothing. Our services are absolutely free for all users. There are no PRO subscriptions, no restrictions on the number or maximum length of downloaded videos.