Monday, May 4, 2015

Introduction To Natural Language Processing

Introduction To Natural Language Processing
Natural Language Processing (NLP) is the study of making machines communicate with human beings through natural languages. Knowledge on many areas including linguistics, artificial intelligence, cognitive science and psychology is required to develop a natural language system. Though natural language processing is a vast area, many work on a portion of it such as text segmentation, tagging, information extraction, machine translation, and so on.
Analysis In Natural Language Processing
The process of natural language analysis runs into stages: tokenization, lexical analysis, syntactic analysis, semantic analysis, and pragmatic analysis. In the first stage, given text is split into words or tokens. This splitting of text is called tokenization. The main problem in tokenization is in finding the word boundaries when ambiguity exists. For example, abbreviations have dots in between which may confuse the tokenizer. Analysing the lexicon and finding the lexical features for the words is the next stage. For machine translation tasks, the lexical features should include parts of speech. Obviously lexicon is very important for lexical analysis. The lexicon contains different words and the features associated with. In order to reduce the size of the lexicon only the basic forms of the words are kept. Morphology is then used to get other forms of words. For example, the word walk is enough for a lexicon and other forms of it can be derived by adding suffixes, say s for plural, ed for past-tense, and ing for present participle.
Once lexical analysis done, analysis should be done at sentence level. Parsing is done to determine the syntactic structure of the sentence. A procedure called parser is used for this purpose. Understanding the text is the critical stage in natural language processing. Semantic analysis finds the meaning for the given text. Meaning may vary depending on the context in which particular sentence was uttered. In a text, to understand one sentence, it may be necessary to understand the previous sentence as well. For example,
Julia won the race. She was very happy.
In the above text, if someone only read the second sentence he cannot know who is she. The purpose of discourse analysis is to find these interdependencies. Finally world knowledge is also required to understand the text. Pragmatics is used for this purpose.

Natural Language Generation

The NLP system should be able to generate natural language sentences automatically. Natural language generator applies reasoning for motivation, think about what to utter, how to utter and also to formulate the output. Natural language generation depends on many kinds of knowledge: domain knowledge, linguistic knowledge, strategic rhetoric knowledge and text types.

Approaches to Natural Language Processing

Symbolic approach, empirical approach and artificial neural network approach are some of the approaches. Symbolic approach applies linguistic theory given the linguistic knowledge. Rules are the common forms in which linguistic knowledge is encoded. Corpus and statistical methods are central to empirical approach. Corpus is a large collection of texts. Preparing the corpus is the tedious task in this approach. Hybrid approaches are becoming popular nowadays. Artificial Network approach is a different way of implementing NLP functions.

No comments:

Post a Comment