Many language processing tasks involve classification. In this project I apply the Bidirectional LSTM to text categorization, the task is assigning a label or category to an entire text or document. The accuracy finally stood at 92 percent.
Most cases of classification in language processing are done via supervised machine learning. In supervised learning, we have a data set of input observations. The goal of the algorithm is to learn how to map from a new observation to a correct output.
Our goal is to learn a classifier that is capable of mapping from a new document d to its correct class c ∈ C.
We use a text document as if it were a bag-of-words, that is, an unordered set of words with their position ignored, keeping only their frequency in the document.