This is the blog section. It has two categories: News and Releases.
Files in these directories will be listed in reverse chronological order.
This is the multi-page printable view of this section. Click here to print.
This is the blog section. It has two categories: News and Releases.
Files in these directories will be listed in reverse chronological order.
I’m working on my English skills by describing scenes in photos with a one-minute speech in English.
Today, I tried creating an AI video using a script I wrote.
The script for the 2nd photo from Engoo Lesson 8 is what I input into the Luma Dream Machine prompt.
I’m reposting my script here:
In this arresting photo, a young woman holds a mixed vanilla-strawberry ice cream in a waffle cone.
She looks stylish as she strikes a playful pose with a slight lift of her left shoulder.
Her oversized sunglasses with pink frames and pink-tinged mirrored lenses vividly reflect the scene before her.
A chunky pink chain necklace and a pink vest add charm to her overall look.
With her ash-blonde hair, she exudes a saucy aura that complements her pink-loving personality.
The blurry background suggests hills and fields with sunlight peeking through from behind.
Incidentally, after taking this picture, she probably intended to enjoy the tempting ice cream before it melted.
(109 words)
And the following image is the AI-generated video it produced:

For reference, here’s the original image I used to create the script.
The generated video was a bit different from what I envisioned, but I was amazed at how quickly it captured the essence of what I wanted.
In my previous post, I explained how to extract the narration audio as an MP3 file.
Now, I’ll explain how I create my own speech videos by mimicking the narration.
I use the following free software:
| Software | Description |
|---|---|
| Audipo (only in Japanese) | An audio player app for smartphones that allows speed adjustments and repeat settings. |
| VLC media player | A media player for desktop PCs that lets you adjust video speed and set repeat settings.It can also control MP3 audio files without video. |
| iPhone Camera App | To record my speech. |
| YouTube | For uploading and sharing videos. |
After extracting the narration audio, I use Audipo or VLC media player to adjust the playback speed and practice my speech.
I keep practicing until I can accurately replicate the narration.

The image above shows my script narration audio playing at 90% speed in the Audipo app.
I also insert yellow triangle markers to separate each sentence.
If I encounter unfamiliar expressions or parts that are too fast, I use the A-B section repeat function to slow down the playback and practice those sections until they feel natural and I can say them accurately.
Finally, I record my speech on my iPhone without relying on the script or narration audio.
I then upload the video to YouTube and set it to ‘unlisted’ so only people with the URL can view it.
I prefer not to broadcast my videos widely.
This training method might take some time, but thoroughly practicing and recording these videos is a great way to track my progress.
I believe that with this effort, I will eventually be able to speak English spontaneously and confidently in my own words.
In the previous post, I discussed how to create a speech draft.
Now, I’ll explain how to create a narration audio file based on the prepared speech script.
I use the following software:
| Software | Description |
|---|---|
| NaturalReader online | It reads the text aloud in a natural voice. |
| OBS Studio | It records the desktop screen, including audio. |
| FFmpeg | It extracts MP3 audio from video files. |
| Audacity | It’s a digital audio editing software. |

NaturalReader online offers enough free reading for speeches lasting up to one minute.
In practice, this free range extends to approximately 20,000 characters.
Also, since downloading audio files directly from NaturalReader is a paid feature, I always use an alternate method to extract the audio files.

While NaturalReader reads the text, I use OBS Studio to capture the desktop screen, saving it as an MP4 video file.
The image above shows my script being read by NaturalReader online, with the screen captured using OBS Studio.
After that, I use FFmpeg to extract an MP3 audio file from the MP4 video.
Here’s the command prompt for extracting the MP3:
❯ ffmpeg -i narration.mp4 -vn narration.mp3

When recording screen captures with OBS Studio, it’s common to end up with silent sections at the beginning and end of the recording.
To address this, I use Audacity, an audio editing software, to trim out the unnecessary silent parts (highlighted in red in the diagram) and extract the narration audio file.
The image above shows my script’s narration audio being edited in Audacity.
The following is a sample audio narration of the script for Exercise 1 from Engoo’s Photo Desctiption Lesson 1.
This is how I create a narration audio file that closely resembles natural speech from the speech manuscript.
I usually draft one-minute speech scripts derived from Engoo’s describing pictures material.
Those scripts can be found on my website under the section entitled “Scripts for Engoo describing pictures.”
When preparing a script for a photo description speech, I start by writing out the details in Japanese to make sure I capture all the nuances of the image.
I verbalise the clothing and behaviour of the people in the photo, or the situation of the objects.
I also include comments expressing my own opinions and thoughts based on the photo to complete the script.

Once the script is written in Japanese, I use ChatGPT to translate it into English.
To instruct ChatGPT for translation, use the following prompt:
”write the photo description script in Japanese here"
translate the script within the quotes above into English as a native speaker would say
When I leave the script as-is after translation, it tends to lose the natural flow of English.
That’s why I have conversations like the one below with ChatGPT to refine it and make it sound more natural.
On the other hand, I’ve noticed that even when I provide the same instructions repeatedly, the resulting translation can vary each time.
So, I compiled a script with the wording I find most suitable.
When I’m aiming for even greater naturalness, I’ll provide the following prompt:
"write the translated script here"
change the script within the quotes above more naturally to make it easier to hear
If the script gets a bit long and can’t fit into a one-minute speech, I’ll give the following prompt:
"write the script here"
condense the script within the quotes to less than 120 words
To create a headline for the article, I provide the following prompt:
"write the script here"
shorten the above sentence into a headline-like format
If the script becomes too long and needs to be split into two one-minute speech scripts, I’ll provide the following prompt:
"write the script here"
divide the script within the quotes into two parts
I’m experimenting like this to compile a script with more natural English expressions.
I’ve launched a website where I document my English learning journey!
I mainly utilize Engoo’s resources, which are freely available for personal use and are tailored for English learners.
In particular, Engoo’s describing pictures material contain a wide variety of photos and I use them for improving my English.

My primary goal right now is to become proficient at spontaneously describing what’s depicted in those images in English.
I understand that mastering this skill overnight is unrealistic.
I used to read news articles and English learning materials every day for about a year, but I realized I was just going through the motions without absorbing much content.
It became clear that I wouldn’t learn to speak in my own words anytime soon.
So, I decided to change my approach.
Now, I’m fully committed to thinking and speaking in English.
I’ve been working on dividing and condensing it into short speeches within one minute, memorizing the script, and then recording myself delivering the speech without reading it.
Through this practice, I’ve found myself saying the scripts much more actively in my own words in English, rather than simply reading something written by someone else.
I want to break the habit of thinking in Japanese and start thinking and speaking in English, with proper grammar.
Currently, I feel like I’m finally starting to speak some English.
When I listen to my recorded speeches, I can tell my pronunciation has improved, and even native speakers have mentioned that it’s getting better.
In preparation for recording these short English speeches, I’ve been thoroughly rehearsing the script, using AI-generated voice narration to mimic a natural delivery until I can confidently recite the speech.
Thanks to this approach, I’ve been able to establish an environment where I can create scripts, extract narration audio, and practice speeches using useful free websites and software.
I hope to share these tips with others who might find it beneficial.
I’d be delighted if you were interested in checking out my website.
Thank you very much for sticking around till the end!