Building a Talking App | AI/ML Series

Welcome to another practical AWS tutorial. This is the written version of following YouTube video. I would recommend you to watch the video before reading the blog. Use this blog as the source to copy the code and practice by building the app by yourself.

AWS services used in the App

  • Amazon Polly
  • Amazon S3
  • AWS IAM
  • AWS Lambda

Creating a Serverless Project/Service

Install serverless framework by with npm and create a new nodejs project/service called backend.

npm install serverless -g
serverless create --template aws-nodejs --path backend

Now replace to serverless.yml file with following code, that creates a lambda function called “speak”.

service: talking-backend 

provider:
name: aws
runtime: nodejs8.10
region: us-east-1
role: arn:aws:iam::<account-id>:role/talking-app-role

functions:
speak:
handler: handler.speak
events:
- http:
path: speak
method: post
cors: true

The “speak” lambda function will send the text payload to AWS Polly and return the voice file from S3 bucket.

Creating a S3 Bucket

We need a S3 bucket to store all the voice clips that is returned by AWS Polly. Use AWS console to create the bucket with a unique name. In my case S3 bucket name is “my-talking-app”.

Create an IAM Role

Serverless framework creates two Lambda functions that interact with AWS Polly and AWS S3 services. (We shall see the code later in the blog). In order to communicate with these services, our Lambda function must be assigned an IAM role that has permission to talk to S3 and Polly. So create an IAM role with a preferred name i.e. “talking-app-role” with the following IAM policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "polly:*",
                "s3:PutAccountPublicAccessBlock",
                "s3:GetAccountPublicAccessBlock",
                "s3:ListAllMyBuckets",
                "s3:HeadBucket"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::my-talking-app",
                "arn:aws:s3:::my-talking-app/*"
            ]
        }
    ]
}

Copy the ARN of the IAM role and add it under the provider section of the serverless.yml file.

provider:
name: aws
runtime: nodejs8.10
region: us-east-1
role: arn:aws:iam::885121665536:role/talking-app-role

“Speak” Lambda Function

Speak Lambda function does three main tasks.

  1. Call AWS Polly synthesizeSpeech API and get the audio stream (mp3 format) for text that user entered
  2. Save the above audio stream in the S3 bucket
  3. Get a signed URL for the saved mp3 file in the S3 and send it back to the frontend application

First of all, let’s install the required npm modules inside the backend folder.

npm install aws-sdk 
npm install uuid

AWS Polly synthesizeSpeech API requires the text input and the voice id to convert the text into speech. Here, we use the voice of “Joanna” to speak the text that is passed from the frontend.

let AWS = require("aws-sdk");
let polly = new AWS.Polly();
let s3 = new AWS.S3();
const uuidv1 = require('uuid/v1');

module.exports.speak = (event, context, callback) => {
let data = JSON.parse(event.body);
const pollyParams = {
OutputFormat: "mp3",
Text: data.text,
VoiceId: data.voice
};

// 1. Getting the audio stream for the text that user entered
polly.synthesizeSpeech(pollyParams)
.on("success", function (response) {
let data = response.data;
let audioStream = data.AudioStream;
let key = uuidv1();
let s3BucketName = 'my-talking-app';

// 2. Saving the audio stream to S3
let params = {
Bucket: s3BucketName,
Key: key + '.mp3',
Body: audioStream
};
s3.putObject(params)
.on("success", function (response) {
console.log("S3 Put Success!");
})
.on("complete", function () {
console.log("S3 Put Complete!");
let s3params = {
Bucket: s3BucketName,
Key: key + '.mp3',
};

// 3. Getting a signed URL for the saved mp3 file
let url = s3.getSignedUrl("getObject", s3params);

// Sending the result back to the user
let result = {
bucket: s3BucketName,
key: key + '.mp3',
url: url
};
callback(null, {
statusCode: 200,
headers: {
"Access-Control-Allow-Origin" : "*"
},
body: JSON.stringify(result)
});
})
.on("error", function (response) {
console.log(response);
})
.send();
})
.on("error", function (err) {
callback(null, {
statusCode: 500,
headers: {
"Access-Control-Allow-Origin" : "*"
},
body: JSON.stringify(err)
});
})
.send();
};

Now, deploy the backend API and the Lambda function

sls deploy

Frontend Angular App

In order to test our backend we need a frontend that makes speak request with user inputted text. So let’s create an angular application.

ng new client
? Would you like to add Angular routing? No
? Which stylesheet format would you like to use? SCSS

Let’s create an angular service that talks to our Amazon Polly backend.

ng g s API

Add the following code to the api.service.ts file. It will create speak function that call the lambda function with the selected voice and the inserted text by the user.

import { Injectable } from '@angular/core';
import { HttpClient } from '@angular/common/http';

@Injectable({
providedIn: 'root'
})
export class APIService {

ENDPOINT = '<YOUR_ENDPOINT_HERE>';

constructor(private http:HttpClient) {}

speak(data) {
return this.http.post(this.ENDPOINT, data);
}
}

Let’s use the main appcomponent to render our UI for the “Talking App”. Goto app.component.html and replace the file with following HTML code. It will add a basic text-area, selection of preferred voice and speak action button.

<div style="margin: auto; padding: 10px; text-align: center;">
<h2>My Talking App</h2>
<div>
<textarea #userInput style="font-size: 15px; padding: 10px;" cols="60" rows="10"></textarea>
</div>
<div>
<select [(ngModel)]="selectedVoice">
<option *ngFor="let voice of voices" [ngValue]="voice">{{voice}}</option>
</select>
</div>
<div style="margin-top: 10px">
<button style="font-size: 15px;" (click)="speakNow(userInput.value)">Speak Now</button>
</div>
</div>

Goto app.component.ts file and add the corresponding hander function for the view. Replace it with the following code.

import { Component } from '@angular/core';
import { APIService } from './api.service'

@Component({
selector: 'app-root',
templateUrl: './app.component.html',
styleUrls: ['./app.component.scss']
})
export class AppComponent {
voices = ["Matthew", "Joanna", "Ivy", "Justin"];
selectedVoice = "Mattew";

constructor(private api: APIService){}

playAudio(url){
let audio = new Audio();
audio.src = url;
audio.load();
audio.play();
}

speakNow(input){
let data = {
text: input,
voice: this.selectedVoice
}
this.api.speak(data).subscribe((result:any) => {
this.playAudio(result.url);
});
}
}

Since we are using, ngModel in the app.component.html we need to import the FormsModule in the app.module.ts file. Goto app.module.ts file and replace the content with,

import { BrowserModule } from '@angular/platform-browser';
import { NgModule } from '@angular/core';
import { HttpClientModule } from '@angular/common/http';
import { AppComponent } from './app.component';
import { FormsModule } from '@angular/forms';
@NgModule({
declarations: [
AppComponent
],
imports: [
FormsModule,
BrowserModule,
HttpClientModule
],
providers: [],
bootstrap: [AppComponent]
})
export class AppModule { }

Running the Application

Now that our backend and the frontend are ready, let’s play with our app.

Go to the client directory and run the angular app locally,

ng serve

Type some text on the text area and select a voice from the dropdown. When you click Speak Now it should speak the text aloud!

Cheers!

Please follow and like us: