Intonation Sentiment Analysis On Speech Signal Processing: A Deep Learning Approach


Using the Deep Learning Networks, Matlab package and machine networks for processing the speech signals is the profound basis for the recognition of speech patterns. Although linguists pay more attention to learning the intonation, the effectiveness of their work presupposes the usage of qualitative programs to demonstrate the spectrum of intonation. The proper distinction of speech signals and the representation of emotional spectrum of the pattern are possible due to the combination of Matlab platforms. Deep learning networks and machine networks comprise the effective technology to process the speech.

1. Linguistics Intonation

The usage of mathematic models’ program type is peculiar for the Matlab. Serving as the programming language and applied software package, Matlab has managed to approve itself as the workable platform for different operational systems, including Windows or Linux. Besides, it approved its effectiveness for audio signals’ processing (Mc Loughlin 2009). With Matlab devices the sound waves change into electric ones and it is possible to store and exchange the information on any device. The reverse process converts the electricity into the sounds. When the speech pattern transmits into the machines, the program feature distinguishes the sound waves as the language pattern, so that speech synthesis takes place. The first machines reproducing the speech after the person used several verb forms, so the sentence “She saw me” sounded like “She see me saw” (Gold et al. 2011). Nowadays, it is hardware based on answer-back or text-to-speech systems, which either types the text or represents it as an equalizer.

Still, the issue to consider remains the transduction and demonstration of the linguistic intonation. The Polish linguist G. Demenko (n.d.) distinguishes the automatic machine speech and that of human. Automatic learning serves the basis of speech recognition. B. Gold D. Morgan and D. Ellis (2011) claim that the first machines that copied speaking appeared in 1940 and they functioned non-automatically.

When distinguishing the intonation pattern, the function of the devices lies in automatic learning. The development of the hardware and software can be possible due to the application of the learning mechanisms (Demenko n.d.). The automatic learning is applicable for the recognition of both human and artificial speech. T. Hastie, R. Tibshirani, and J. Friedman (2009) in their study oversee the unsupervised learning as more advanced level of the hardware and software development. The scholars prefer statistical methods increasing the probability density of acquiring the ‘learning skills’. The demonstration of the microphone work principle by I. Mc Loughlin shows the analogue-to-digital (ADC) and digital-to-analogue (DAC) converters going one after another to show the basic principle of sound transmitting. Surely, modern software possibilities recognize and synthesize the speech pattern, represent the intonation on the equalizers as well as transmit music. Such ways of application and functionality underline the importance of further software improvements for the sake of its practical application.

2. Text Sentiment Analysis Using R

The programming language R comprises another basis for analyzing the text sample. It is functional for managing documents and performing different manipulations at various operating systems. Text sentiment is a relatively new object for R as well as speech pattern study.

R Sentiment package functions to distinguish opinions, emotions and other mental acts represented in the text. M. Gathu (2015) in his work shows the wideness of practical applications of sentiment analysis. It can help predict marketing fluctuation based on ads, characterize news, blogs and social networks. The polarity detection serving the key goal of R sentiment analysis denotes positive, negative and neutral information given by the speaker on the basis of the supervised learning. The work stages include pre-processing and text classification. The software determines the word frequencies and refers the whole text either to positive or negative information.

The collection of the specific tasks and the system of the databases unites the library. R Sentiment analysis includes library strings and library xlsx as the most preferable. Due to R programming language, it is possible to process the text and store it in the database, which forms the library. String manipulation includes four main objects for its storage: data, matrix, vector and factor (Sanches 2013). The library works on the algorithm and uses support vector machine and naïve barriers device. The vector information collects into data and matrix system proceeds and categorizes the information using different factors and systemizes it according to the categories.

Another type of libraries is xlsx. When the previous string manipulation library used one technique, its files had the extention stings (Abedin 2015). The file undergoes the changes and processing correspondent to the algorithm of xls library and its files in the given commands will receive the same extension.

3. Speech Signals Processing Using Matlab

Unlike textual information, Matlab includes the programs that process sounds. When describing Matlab toolbox MIRT, O. Lartillot and P. Toiviainen (2007) compare to software for sound recognition on Matlab to Marsias written on Java.  

The ‘feature’ represents the sound inside the vector or the sample (Mc Loughlin 2009). The complete features do not become instantly noticeable, but they usually appear in the center of the sound frame. The sound pattern recognition starts from feature extraction. The extraction process begins with the subdividing of sound segments into the homogeneous patterns along the alternative features (Lartillot & Toiviainen 2007). The overlapping process includes the pattern containing the components of the previous one, comprising the chain of homogeneous features (Mc Loughlin 2009).

The cepstrum signal processing denotes the definition of the spectrum in the repeated features. It is significant for most speech recognition systems compared to linear predictive coding (Oppenheim & Schafer 2004). Cepstrum in Matlab draws the waves and pictures the spectrum. In Matlab the rceps function perform the cepstrum processing both for the graphic and audio information pattern. After forming the minimum phase sequence, it returns to the input.

To extract the sound it is necessary to choose audio signals and musical compositions corresponding to certain requirements. Energy, spectrum, perception and temporal features with zeros cross number and auto regression model represent the musical specimen, whereas audio signal should not depend upon the speaker or regard to volume or external noise.

Stay Connected

Live Chat Order now
Stay Connected

When analyzing the speech pattern, one of the most effective methods of feature extraction is perpetual linear predictive analysis (PLP) (Hermansky 1990). It anticipates the all-pole transfer of the vocal tract into a certain number of resonances. PLP method consists of the following stages the speech undergoes (1) critical band analysis, (2) equal loudness pr-emphasis, (3) intensity-loudness conversion, (4) inverse discreet forier transform, (5) solution for auto regressive coefficients. Due to the mentioned stages the pattern transforms into all pole model. Computational requirements for PLP are comparable to those of LP analysis. This methodology is effective in decoding words and messages under different circumstances showing wider and more comprehensible and complicated speech signals.

Unlike PLP method, RASTA methodology applied with and without PLP can process the signals with and without distortions. It easily identifies different speech patterns from one another. Hence, due to recognizing the phonemes or syllables the sound recognition proves to be more precise and distinct. Unlike cepstrum processing, RASTA influences the speech in a more complex way, creating the dependence between the current output and the past and enhancing the spectral transitions. RASTA implies band-pass filtering of durable trajectories in speech pattern. The method can function with different speech parameters and effects. Dealing with convolutional noises causes the appearance of optimal compressive static nonlinearity transforms into logarithmic function. The expansive static nonlinearity is its exact inverse (Hermansky 1994).

Wavelet signal decomposition at Matlab includes three main stages: entering the filter, coefficient deconvolution, and coefficient decomposition. The second stage should center on filter coefficient deconvolution and the third one on the downward sampling (Hong-tu & Jing 2010). Such technology is rather quick, and it does not allow distorting the signals. The importance of Matlab basis for wavelet decomposition is necessary as they correlate to each other and accompany the important processing stages.

Feature reduction at Matlab helps to reduce the size of the segmented sound to provide the proper transmitting of the information file. The process of reduction is vital for lessening the loudness of the sounds and decoding them into quieter sound waves (Loughlin 2009). The Matlab platform enables the automatization of the process and creating optimal space at the software.

Feature selection takes place during the extraction process. Due to the necessary algorithm given, the filters start distinguishing the similar features of the sound system and develop the graphical representation of the waves or conduct it to other destination. Storing the data and creating matrix allows selecting the homogeneous features and identifying the necessary files. The normalized cutoff frequency of digital filters Batterworth filters enables processing the data quickly and sufficiently.

4. Deep Learning Networks

Deep belief network (DBN) comprises practical importance for the development of deep leaning. The network creates the hierarchical system of functions that allows arranging the algorithm of the data processing system. The "probabilistic max pooling", helps implementing convolution DBNs at a large scale alongside with sound semantic maintenance. The lowest level of the system detects edges at images. The following level anticipates collecting the edges into certain particles. During the final stage the particles assemble into the models. The practical application of DBN is useful for textual and audio files’ classification as well as mere object recognition. When presenting DBN work principle, the lecturer compares it to the way of perception of a visual image by an individual. The visual analyzer, a human eye, recognizes the object by means of special sensitive cells and conducts the impulse to a definite brain centre. In general, perceptive processes include both brain spheres and cortex (Ng 2009).

Restricted Boltzmann machine (RBM) model can formulate DBM on the basis of data matrixes by way of systemizing and combination. Although RBM cannot formulate the top level DBN, the Contrastive Divergence for training RBMs enables creating the intermediate layers of DBN and the capacity of the network will increase (Bengio 2009).

Conditional Restricted Boltzmann Machine (CRBM) previously used in motion captures (Mohamed & Hinton 2010), has approved itself as sound pattern capture device. Their functionality in the combination with DBN provides the undistorted sounds. Boltzmann machine composes the artificial neural network.

Divergence learning procedures are preferable among RBMs trainings (Hinton, 2010). This needs some practical experience. The principal things to improve are setting the numerical data values. Distinguishing the initial values of the weights, the number of hidden units and mini-batch sizes will help provide qualitative information processing and verify RBM’s functions.

Having analyzes RBM and DBN application, the scholars provide visible units in accordance with each training case for RBM. Still, the improved and workable version needs the implication of the missing values. As G. Hinton (2010) claims each RBM forms a family of different models with shared weights. The structural RBM can correct inference for its hidden states, but the tying of the weights means that they may not be ideal for any particular RBM (Hinton 2010).

5. Machine Learning Methods Comparison

Support vector machine (SVM) simplifies the algorithm of transmitting and recognizing the information (Tong & Koller 2002).  Alongside with text classification peculiar for RBM, SVM can recognize the handwritten texts. The separation of the linear data is the necessary condition for spacing SVM version availability. Linear data separation exercises through the following ways: (1) high dimension of the feature space; (2) kernel modification. Due to these factors the data in the new induced feature space become linearly separable (Tong & Koller 2002).

Limited time Offer

Get 19% OFF

Among the SVM application methods, the simple and the ratio are the principal ways. The analysts consider it as a rougher and more unstable approximation functioning faster compared to the simple method. The experiments with the hybrid method indicate the possibility to combine the benefits of the Ratio and Simple methods which will enable better technical and practical chracteristics (Tong & Koller 2002).

Naïve Bayes classifier is a simpler probabilistic classifier than SVM (Ting, Ip, & Tsang 2011). In most cases it provides the statistical information dealing with the occurrence of a certain parameter or feature within the processed piece of information.

The possibilities of Naïve Bayes allow learning the categorized text and providing statistical information due to the supervised learning. The text categorization goes on due to the comparison of the processed information showing the word list and the occurrence of different lexemes according to the given parameters. Concequently, Naïve Bayers function to classify the new documents to their right categories, taking into account their probability rates. They are good at simplified procedures of text segmentation and have more limited application unlike SVM (Ting, Ip, & Tsang 2011).

K-nearest neighbours belongs to text categorization tools alongside with the previouslydescribed Naïve Bayers and SVM (Jiang, Pang, Wu, & Kuang 2012). All of them help manage and organize the surging text data. KNN and SVM have much better performance than other classifiers.


The possibilities of text and speech processing count various hardware and programming languages. The ways of transmitting, analysis and synthesis of the text or speech patterns include Matlab and R bases for using different algorithms. The differences in the purpose and the representation ways determine the usage of this or that methodology.

However, KNN refers to the sample-based learning method. It uses all the training documents for predicting the labels of test document. It is peculiar for text similarity computation. That is why real-world applications need wider and more extended software to operate. The improvements in KNN algorithm for text categorization based on one pass clustering algorithm and KNN algorithm could change the perspectives and the ways of its usage.

  1. The United States Country Report: The Use and Impact of ICT essay
  2. The History and Benefits of Using RFID essay


Preparing Orders


Active Writers


Support Agents

Limited offer
Get 15% off your 1st order
get 15% off your 1st order
  Online - please click here to chat