Here is a short description of the project assigned to us.
WEB CRAWLER
- Crawl open web data such as Wikipedia, news articles and social media.
- Extract the text present in each article
- Clean the text and parse it into sentences
- Shortlist sentences based on criteria such as number of unique words per sentence
- Create a final set of sentences and calculate statistics such as number of phonemes, unique words, etc.
VOICE RECORDER
A recording tool needs to be developed for Windows, Mac and Web/Mobile platforms. The tasks consists of :
- Take in an input file with a list of sentences to be recorded
- Display one sentence at a time. The user should be able to record the sentence, stop, play and re-record each recording. Once the user is satisfied with the recording, then he should be able to go to the next sentence.
- The recordings should be verified using signal processing algorithms.
- The save option should be activated when the speech signal is approved by the algorithm.
ITRANS
A UI tool that prepares text according to an audio file The task consists of:
- The tool requires four input parameters a) ITRANS script file b) Unicode script file c) Audio files
- Renaming and normalizing the audio files.
- Upload the modified audio files on the server in “tar” format.
- Generate individual ITRANS and Unicode files for each audio file.