Speech-to-Text (Batch)

Getting Started

This guide will walk you through how to transcribe pre-recorded audio with the Reverie API. We provide two scenarios to try: transcribe a remote file and transcribe a local file.

Before you start, you’ll need to follow the steps in the Get your API Credentials to obtain your API key.

Install Dependencies

npm i @reverieit/reverie-client

Transcribe Audio from a Remote Stream

To transcribe pre-recorded audio using one of Reverie’s API, follow these steps.

const ReverieClient = require("reverie-client");

const reverieClient = new ReverieClient({
apiKey: "YOUR-API-KEY",
appId: "YOUR-APP-ID",
});

const response = await reverieClient.transcribeAudio({
audioFile: file,
language: lang,
subtitles: subtitles
});

console.log("Response from API:", response);

Results

In order to see the results from Reverie, you must run the application. Run your application from the terminal. Your transcripts will appear in your shell.

# Run your application using the file you created in the previous step
# Example:
npm start

Analyzing the Response

{
  "job_id": "e21f356d-cbf9-4d62-a960-1e9da1805d19",
  "code": "000",
  "message": "Transcript ready.",
  "result": {
    "transcript": "Hello. Welcome to Reverie.",
    "original_transcript": "HELLO. WELCOME TO REVERIE.",
    "channel_number": 1,
    "words": [
      [
        {
          "conf": 0.991683,
          "end": 0.21,
          "start": 0.09,
          "word": "HELLO"
        },
        {
          "conf": 1.0,
          "end": 0.6,
          "start": 0.21,
          "word": "WELCOME"
        },
        {
          "conf": 0.99723,
          "end": 0.72,
          "start": 0.6,
          "word": "TO"
        },
        {
          "conf": 1.0,
          "end": 1.320315,
          "start": 0.72,
          "word": "REVERIE"
        }
      ]
    ],
    "subtitles": "1\n00:00:00,090 --> 00:00:06,900\nHELLO. WELCOME TO REVERIE.\n\n"
  }
}

In this response we see:

job_id : A unique Identity number auto-assigned by the API for each request.
code : Provides a message code which can be used to look up the nature of the response returned by the API.
message :Provides a brief description about the response returned by the API.
result : An array of transcript objects including, channel_number, transcript, list of words with start time, end time and confidence.Please check the sample response.

Key Features

Real-time Transcription

Transcribe pre-recorded audio into text with high accuracy in real-time, even from lower-quality inputs.

Personalized Speech Model

Customize recognition for domain-specific terms to boost accuracy of unique words or phrases.

Noise Resistance

Decode moderately noisy audio from various environments without extra noise cancellation.

Content Filtering

Filter out inappropriate content with an obscenity detector for clean text output.

Cloud-based Deployment

Scalable and accessible from anywhere for dynamic, distributed teams.

On-premise Deployment

Secure and customizable to integrate with your existing infrastructure.

Sample Code

Python

Access Python SDK samples for real-time speech transcription on GitHub

JavaScript

Explore JavaScript SDK samples for speech-to-text streaming on GitHub

GoLang

Find GoLang SDK samples for speech transcription on GitHub

FAQs

What is the accuracy of real-time transcription?

The solution delivers high accuracy, even with lower-quality audio, thanks to: - Robust speech decoding technology - Built-in noise resistance for reliable performance

Can I customize the speech model for my industry?

Yes, the Personalized Speech Model feature lets you: - Tailor recognition to domain-specific terms - Boost accuracy for unique words or phrases specific to your use case

Does it work in noisy environments?

Absolutely, the Noise Resistance feature ensures: - Decoding of moderate noise without extra cancellation - Consistent performance across diverse environments

How do I deploy the solution?

It supports flexible deployment options: - Cloud-based: Scalable and accessible anywhere - On-premise: Secure and tailored to your infrastructure

Which domains are supported?

Specialized models cover a wide range, including: - BFSI, Healthcare, E-commerce, and more - See the Supported Domains section for details

Supported Languages

The Speech-to-Text solution supports transcription in multiple languages, tailored for diverse regional and linguistic needs:

hi - Hindi
bn - Bengali
gu - Gujarati
kn - Kannada
ml - Malayalam
mr - Marathi
pa - Punjabi
ta - Tamil
te - Telugu
en - Indian English
as - Assamese
or - Odia

Supported Audio Formats

The Speech-to-Text solution supports various audio formats for flexible integration:

Audio Format	Description
16k_int16	Default format: Signed 16-bit, 16KHz sampling rate in WAV format
16k_uint8	Unsigned 8-bit, 16KHz sampling rate in WAV format
8k_int16	Signed 16-bit, 8KHz sampling rate in WAV format
8k_uint8	Unsigned 8-bit, 8KHz sampling rate in WAV format
opus_16k	Opus-encoded audio frames, 16KHz sampling rate
opus_8k	Opus-encoded audio frames, 8KHz sampling rate
ogg_opus	Opus-encoded audio frames in Ogg container
16k_ulaw	µ-Law audio frames, 16KHz sampling rate
8k_ulaw	µ-Law audio frames, 8KHz sampling rate

API Messages

Code	Message
000	Transcript ready
001	Invalid JOB ID
002	Invalid JOB ID
003	Your request is in the queue and will be processed shortly
004	Your request is being processed
005	Job failed. Please contact the developers
999	Unknown error

Getting Started

Usage Guides

API References

Use Cases

SDKs

Endpoints

Getting Started

Install Dependencies

Transcribe Audio from a Remote Stream

Results

Analyzing the Response

Key Features

Real-time Transcription

Personalized Speech Model

Noise Resistance

Content Filtering

Cloud-based Deployment

On-premise Deployment

Sample Code

Python

JavaScript

GoLang

FAQs

Supported Languages

Supported Audio Formats

API Messages

Getting Started

Usage Guides

API References

Use Cases

SDKs

Endpoints

Documentation Index

​Getting Started

​Install Dependencies

​Transcribe Audio from a Remote Stream

​Results

​Analyzing the Response

​Key Features

Real-time Transcription

Personalized Speech Model

Noise Resistance

Content Filtering

Cloud-based Deployment

On-premise Deployment

​Sample Code

Python

JavaScript

GoLang

​FAQs

​Supported Languages

​Supported Audio Formats

​API Messages

Getting Started

Install Dependencies

Transcribe Audio from a Remote Stream

Results

Analyzing the Response

Key Features

Sample Code

FAQs

Supported Languages

Supported Audio Formats

API Messages