Kokoro Irodoru Progress Log

A different kind of article

This article is a small journal I’ve written while I was making my entry for Re:Sample’s #RSVideoJam1.

It’s a little bit more experimental than usual. I have not revised or rewritten any entry, those are the raw thoughts that I’ve found worth writting down as I was having them. As such, even though I do share some informations about the way I work, they might not be written in a way that’s easily digestible or easy to learn from. Nevertheless, I hope you’ll enjoy reading through them.

Progress log

2025-08-16

After much agonizing, I’ve decided to pick Kokoro Odoru and use GA as the source. GA isn’t really the “origins” part of my submission, even though it’s a really old manga/anime, I only discovered it fairly recently. The “origins” aspect comes from Kokoro Odoru instead, it was one of the popular BGMs back in 2017 when I started discovering Otomads.

I’ll do my videos in the style of all the other Kokoro Odoru videos at the time, that is, a linear retelling of the story. I’ll also keep it simple like they did, meaning just a few audio tracks at best, with a focus on the sentence mixing.

This is what I’m starting with:

alt text

Now I have to figure out how to best manage my media. Because I’ll have to go through the whole anime, it’s tempting to just drag and drop the episodes in reaper and get cutting. The problem with this approach is that I’ll have to remove the background music at some point, I can run uvr5-cli locally, but doing so on 13 whole 20-minutes-episodes is not realistic.

Problem 2: working with compressed videos in reaper can get pretty expensive.

Problem 3: I must avoid breaking the link between video sources and audio sources as much as possible, I will eventually render all the visuals from reaper, so at the end of the project, it’ll be better if I can just select all my audio-only media and replace it with the video files. So it won’t do if I only clip audio files from the full episodes. I’ve suffered many times in the past because I had a badly labelled .wav file and had to remember where exactly it was in the anime.

With all that taken into account, I think the best way to handle media management in this situation is to import the whole episodes in reaper, and then, whenever I’m about to use a scene, render the video in place BEFORE I start cutting and rearranging the syllables, this will allow me to:

Apply vocal isolation on the specific clips when needed;
Reduce the load by using audio-only renders;
Easily switch to the video clips once I’m done.

I’ll be fairly greedy with the clip length, because my voice isolation software crashes on files that are too short, and because I don’t want to go back and rerender longer sections every single time.

The audio and scene-hunting will be done in two different reaper files. As for rendering both the .wav and .mp4 at the same time, reaper actually allows you to have a secondary render if needed.

First let’s find the BPM of the song. 113.

I’ve also symlinked my GA anime folder inside of the project folder, in the end it won’t matter much since all my clips will be local to the project, but it’s a habit I want to develop. That way, every path will be relative.

Now all that’s left is to start gathering scenes.

alt text

2025-08-18

Did some more progress today. When doin sentencing on Kokoro Odoru, I try to avoid having more than 4 syllables per beat, or else it starts to sound pretty clumsy. From the other videos I’ve seen on this BGM, it’s preferable to overstretch or repeat syllables rather than go too fast. I struggled a bit for the start because Kisaragi’s monologue was very hard to fit in there. On top of that, I also need to be aware of the rhythm of the song, the drums are one thing, but there’s also an inherent rhythm to the song, the guitar, the melody… There are specific beats where I must ensure that a word/sentence is over.

I’ll try to set up daily goals, like 10-20s of sentencing. That should help me finish this video quicker.

Also, I just thought of the coolest title for this video yesterday: ココロイロドル　(心彩る -> coloring heart).

2025-08-19

Bit tired but I promised myself I’d still put in some work.

alt text

I did some decent progress, 8 measures this session. There’s a bunch of muted attempts all over the place that didn’t fit this section of the song, but that I might be able to use later. What’s a bit concerning is that I’m only at episode 1 so far, it’s clear that I’ll have to skip a bunch of scenes if I want to fit the whole anime in the duration of the song. Because it’s a 4koma adpation, there are a lot of self-contained skits that won’t make it entirely. Right now I’m trying to keep scenes going for at least two measures, but during the chorus, I might be able to do a more rapid-fire sentence mixing and switch scenes more often.

2025-08-20

I once again didn’t start working until 11-or-so-PM, starting to see a pattern…

I tried to have two difference sentencing tracks panned on each side, but it just comes out as a garbled mess when I try to mix complete meaningful sentences. This kind of style is better suited for rhythmic sentence mixing.

Managed to do another 20 seconds, but this still feels like too little progress for a 4min long video.

alt text

There are even more “discarded” ideas this time.

I have a pretty good idea of what scene to use tomorrow, the bgm chills a bit so it would be a good spot to put one of the first “emotional scenes” (i.e. Kisaragi at the museum wondering about her capabilities).

2025-08-21

You would NEVER guess what time it is.

Anyways, said I would use the museum scene, but I’ll probably put the playing tag scene before that, either I keep the punchline, or I transition directly from the tag scene where Kisaragi “can’t catch up” into the museum scene that’s about the same theme, but figuratively.

Tried it and the two scenes go pretty well together back to back. I’m also glad I know just enough japanese to be able to trim out some fat while still preserving the meaning of sentences, otherwise it would be much harder to get everything to fit. You’ll notice that “getting things to fit” is a really common issue in this project.

I could continue, I’m not out of ideas just yet, but it’s getting late and I’ll need to be in extra good shape for my work tomorrow and the day after. So I’ll go too bed.

“Too bad dummy, should’ve started earlier” - I say to myself.

2025-08-22

Didn’t put in any work today, colleagues called me after work to go grab a drink and I went, it was a nice evening. This isn’t really related to my part or anything, I’m just posting it here to say that… I don’t know, life happens I guess?

2025-08-24

God damn it Masa!

Her voice is so quiet, it’s even lower than the background music, I’m really scared of what will happen when I run voice isolation on it!

2025-08-27

Once again took a few days’ break from this… Did manage to get to the second chorus this session.

2025-08-29

Deadlines are approaching for two other projects, so I really need to at least finish the sentencing this weekend.

2025-08-30

Now taking care of the “whistling” section, I’ve decided to dedicate the entire Yaminabe chapter to it.

I’ve also noticed something: when repeating a syllable beforehand to fill blank space, too wild of a variation in pitch in the syllable will make the repeatition sound worse (like a pitch slide or an unsteady voice). Trimming such syllables can reduce the amount of “information/variation” and make them more bearable when repeated.

Having recently watched Jojo from part 1 to 5, I can’t fathom how Nero does it. You’d think 4 minutes is a lot for an Otomad, but even with an anime that’s only 13 episodes long, I have to make a ton of decisions and compromises on what to keep and what to throw away. So the fact that Kokoro Jojoru can cram 39 episodes’ worth into a single video is insane…

alt text

I have finally reached (what I plan to be) the lyrical section, I’ve been thinking of customizing the lyrics for this part ever since I started the project, so I already have a few ideas, here’s what I’ll be going with:

ENJOY (Noda) デッサンはまだ続ける (Tomokane) IT’S JOY (Noda) 絵描たい胸の鼓動 (Namiko) ココロイロドルる鉛筆だけで (Kisaragi & Miyabi)

I think it would be better to have that kinda “rough” sounding syllables that Nanka’s video has by using syllables from the anime, but I really don’t feel like going over the whole thing again to hunt for syllables… so fuck it, I’ll use the character songs instead, even if they won’t have the exact emotions I’m looking for.

2025-08-31

I’m all pumped up today, I have the whole day to work on this and even woke up a bit earlier than usual.

So I’ve made a change of plans, I won’t just finish the sentencing, I’ll try to get the whole audio finished today! I’ve already ran voice isolation on all the character songs so I can get started straight away.

Nevermind, this sounds so bad, it’s abysmal dogshit, I’ll just bite the bullet and do the lyrics the way I originally intended.

(For anyone wondering, I actually find the second one more desirale).

The decision to symlink the episodes in my project folder has been extremely helpful, not so much for the relative paths (since the episode files never make it to my main .rpp in the first place), but rather because it’s extremely convenient to have everything in the same location, switching to the tree view rather than the icon view means I can access both the episodes and the clips I make from the root of the project.

I’m finally “done” with the sentencing.

Here’s what the audio currently sounds like.

Now I have to run all the clips I rendered through UVR while I cook myself lunch.

for f in *.wav; do isolate "$f"; done

Which calls the following bash script for each file:

#!/bin/bash

# Initialize variables
all_but_last=""
last_param=""

# Loop through all parameters
for param in "$@"; do
    if [ -n "$last_param" ]; then
        all_but_last="${all_but_last} ${last_param}"
    fi
    last_param="$param"
done

# Trim leading and trailing spaces from all_but_last
all_but_last=$(echo "$all_but_last" | sed 's/^ *//;s/ *$//')

audio-separator \
--model_file_dir ~/Documents/uvr-models \
--single_stem=vocals \
--output_dir uvr \
--output_format=wav \
--mdxc_batch_size=4 \
$all_but_last \
"$last_param"

I have some great news. UVR is able to pick up on Miyabi’s voice even when it’s quieter than the BGM.

For those wondering how I’m handling the media replacement, I have all my stuff in a folder, and the isolated version in an “uvr” subfolder. I then use Reaper’s project bay to replace all my sources.

Dragging the new source on top of a media item also gives you the option to “replace all instances”, however, it sometimes breaks stuff like rate and stretch markers so I can’t use that.

Sometimes, I actually WANT the background sounds to remain (like when I’m using an sfx from the scene within the sentencing), in that case, I have listen for places where an sfx is missing and re-replace that specific instance with the original, non-isolated version.

alt text

Now that I’ve replaced all my sources with voice-isolated ones, I can select every setencing item and normalize them.

It means I’ll go over everythong one more time, because even though it’s normalized, some voices will “feel” louder than others at the same volume, and some normalized consonants will need to be toned down. Loud characters like Tomokane have lots of dynamics in their voice, so when they’re normalized using the peaks, her voicelines will sound much quieter than Miyabi’s.

… And I’m now realizing that this will make it a pain to adjust, so I will rearrange my item placement to have one track per character, that way it’ll be easier to adjust them uniformly.

alt text

Much better.

I was able to avoid rogue consonants (like an “s” sound that becomes way too loud) because most consonant items have the previous or following vowel attached to them, so the peaks will stop at those instead.

I think I’m done with the audio… Now I have to do that whole media replacement thing again but this time with the MP4 clips. I plan to render the first visual pass from Reaper, and then add all the text and other stuff from Resolve.

Now that this is done, I can run a very convenient script by Xraym that expands items to fill some gaps, this is useful as there are a ton of very short gaps in my sentencing that would appear as flickering in the final output.

This won’t be a very intense video visually, so I’ll try to do as much as I can in reaper, including some fades/zooms/whatever.

There are some video processor effects which are applied to the composite (like the transform effect), and other effects that are only applied to the track they’re on (it’s the case for the “fades affect opacity” effect). So I either have to duplicate said effects, or move the items to the parent track, I’ll go with option 2 for now.

alt text

I’m not saying that reaper is the best tool for making visuals, it’s not. It performs poorly since it’s bottlenecked by ffmpeg for playback, and you can’t make anything too complex… What I will say though, is that in my current situation, with all the workflow decisions I took and the way I organized my project file, Reaper is the best tool for making these specific visuals.

Working with the video processor is a very special workflow, since every track acts kind of like an “adjustment layer”, I’m able to make slide transitions from and to all directions with just a single track.

I can put a video in my “slide transition” track only for the duration I need, and then have all other items in a different track once the transition is over. On top of that, automation envelopes are super handy!

I got a lot more done than expected, the whole Reaper side of the visuals is basically done.

2025-09-01

I got started right after work, most of the video is already done, with all that’s left being the chorus scene, and the character name callouts at the end.

It’s rendering now…

I’m exporting the PNG sequence and it’s super heavy because it’s a 4 minutes song. Once I have that, I’ll immediately do the final 2k encode via ffmpeg and then back it up on gDrive.

I’ll try to show some restraint and not post it right after it’s rendered, that way I have the time to format this blogpost and also do the english subtitle track, so that I can upload all of them at once.

All of the video renders pretty fast except for the part that has motion blur.

2025-09-02

With the video done, I’ll start work on the subtitles. I’ve reinstalled aegisub for this. Picking it back up was faster than expected, the only issue I encountered was that it was playing audio with the ALSA player instead of Pulse audio, causing it to be super sped up.

alt text

I’ve been wondering about those purple lines, turns out they represent key frames when you import h264/other delivery formats in the software.

That’s crazy useful!

I’ll mostly be copying the actual subs from the anime, except for when I needed to cut out some parts of a sentence or recontextualize scenes.

It’s also really intuitive too. It didn’t take me long at all to realize that blackened words meant there were typos, and that adding them to the dictionary was as simple as a right click. Aegisub feels like one of those software that has been polished to perfection over the course of its long, long life.

The subtitles are all written, took about an hour and a half. Now to style them for each character.

I’m pretty sure that youtube supports styled subtitles since so many videos have them, but it seems like an srt file is not enough. Is it something that’s limited by subscriber count?

Seems like I need to give it .ytt files and the converter doesn’t work straight away, I’ll install mono, if even that fails, I’ll ask someone with windows to convert it.

After that, I just need to finish this blog article.

Majormilk was able to convert it for me, we went back and forth with a bunch of corrections to improve readability, I ended up choosing boxes instead of colored strokes.

There is not much else to say, I hope you will enjoy Kokoro Irodoru. With this project out of the way, I can go back to working on two other projects that are also nearing their own deadline.

Thanks for reading, and see you next time.