Regex to clean up Google Hangout transcript

Posted on Mar 23, 2020

Intro

I’ve had a couple of remote movie nights with friends recently where we watched a stream and used Google Hangouts to chat as it was going on. I wish there was a way to automatically save the transcript, but I’ve just ended up copying and pasting the text into VS Code and cleaning it up there.

What it does

The pasted transcript is in the format:

[Name]
[Message]
[Time sent]
[Name]
[Message]
[Time sent]

This is a little clunky. We can do a find/replace in VS Code (or another regex-supported editor) to produce the following format:

[Name] - [Time sent]
[Message]

[Name] - [Time sent]
[Message]

Steps

First I had to do a simple find/replace to replace You with [my name]:

  • Find: \nYou\n
  • Replace with: \nTyler Wengerd\n

Next, let’s create the big regex. Replace each Name One, Name Two, etc., instance with the space-separated name you want to replace. We had 4 people in the hangout example below.

Find:

(Name One|Name Two|Name Three|Tyler Wengerd)\n(.*)\n(\d{1,2}:\d\d [AP]M)

Replace with:

$1 - $3\n$2\n

That’s it!