You are on page 1of 8

Introduction

Natural language processing (NLP) is a subfield of computer science, information


engineering, and artificial intelligence concerned with the interactions between computers
and human (natural) languages, in particular how to program computers to process and
analyze large amounts of natural language data.

Though natural language processing tasks are closely intertwined, they are frequently
subdivided into categories for convenience. A coarse division is given below.

Grammar induction[13]

Generate a formal grammar that describes a language's syntax.


Lemmatization
The task of removing inflectional endings only and to return the base dictionary form
of a word which is also known as a lemma.
Morphological segmentation
Separate words into individual morphemes and identify the class of the morphemes.
The difficulty of this task depends greatly on the complexity of the morphology (i.e. the
structure of words) of the language being considered. English has fairly simple
morphology, especially inflectional morphology, and thus it is often possible to ignore
this task entirely and simply model all possible forms of a word (e.g. "open, opens,
opened, opening") as separate words. In languages such as Turkish or Meitei,[14] a
highly agglutinatedIndian language, however, such an approach is not possible, as
each dictionary entry has thousands of possible word forms.
Part-of-speech tagging
Given a sentence, determine the part of speech (POS) for each word. Many words,
especially common ones, can serve as multiple parts of speech. For example, "book"
can be a noun ("the book on the table") or verb ("to book a flight"); "set" can be a
noun, verb or adjective; and "out" can be any of at least five different parts of speech.
– discuss]
Some languages have more such ambiguity than others.[dubious Languages
with little inflectional morphology, such as English, are particularly prone to such
ambiguity. Chinese is prone to such ambiguity because it is a tonal language during
verbalization. Such inflection is not readily conveyed via the entities employed within
the orthography to convey intended meaning.
Parsing
Determine the parse tree (grammatical analysis) of a given sentence. The grammar for
natural languages is ambiguous and typical sentences have multiple possible
analyses. In fact, perhaps surprisingly, for a typical sentence there may be thousands
of potential parses (most of which will seem completely nonsensical to a human).
There are two primary types of parsing, Dependency Parsing and Constituency
Parsing. Dependency Parsing focuses on the relationships between words in a
sentence (marking things like Primary Objects and predicates), whereas Constituency
Parsing focuses on building out the Parse Tree using a Probabilistic Context-Free
Grammar (PCFG). See also: Stochastic grammar.
Sentence breaking (also known as sentence boundary disambiguation)
Given a chunk of text, find the sentence boundaries. Sentence boundaries are often
marked by periods or other punctuation marks, but these same characters can serve
other purposes (e.g. marking abbreviations).
Stemming
The process of reducing inflected (or sometimes derived) words to their root form.
(e.g. "close" will be the root for "closed", "closing", "close", "closer" etc).

As NLP deals with processing of languages our project motto is to detect type of
Programming language that a given code belongs to.
The code may be of any programming language and it was identified by the Built in
stack_overflow dataset
Objectives

The main objective was to recognize different programming languages that the given input
code
belongs to.
The goal of natural language processing (NLP) is to design and build computer systems that
are able to analyze natural languages like German or English, and that generate their outputs
in a natural language, too. Typical applications of NLP are information retrieval, language
understanding, and text classification. The development of statistical approaches for these
applications is one of the research activities at Lehrstuhl für Informatik 6.
Information retrieval (IR) deals with the representation, storage, organization of, and access
to information items. Given a query the goal is to extract a subset of documents from a large
data collection that satisfies a user's information need. Besides written texts the database
may also contain multimedia documents, e.g. audio and video data.
In natural language understanding, the objective is to extract the meaning of an input
sentence or an input text. Usually, the meaning is represented in a suitable formal
representation language so that it can be processed by a computer.
The goal in text classification is to assign a text document to one out of several text classes.
For newspaper articles, such classes are sports reports, finances, and politics.
Information Retrieval
Natural Language Understanding
Spoken Dialogue Systems
Text Classification and Clustering
Methods and Concepts

The methods and concepts used in this project are briefly explained below.
The concept of pandas , keras and its preprocessing models are used.
train : It is used to train the dataset given,attributes with common features will be
grouped together.
Numpy is the fundamental package for scientific computing with python.
The project starts by importing the required modules from the anaconda as
mentioned below
Import pandas as pd
This command imports the package pandas likewise other packages are also
imported
Next step we have to download the stack_overflow dataset from the provided
URL
We have to specify the path of the dataset downloaded and convert it to csv form
data.to_csv(stack-overflow-data.csv)
By this ,the dataset will be converted to csv format,we can look the head of the
data by
df.head(),it will display the head portion of the data in the post and tag format
Df.post() the given dataset contains posts and tags, the given command will give
the posts that are found in the dataset.
Tkinter :this was a python’s de-facto standard GUI package.It is a thin object
oriented layer on top of Tcl/Tk.
def predict():
print("Prediction on progress..")
entered_input=E1.get()
print("Entered Input",entered_input)
sample=tokenize.texts_to_matrix([entered_input])
k=model.predict_classes(new)
#entered_input=cv.transform([entered_input])
#y_pred=model.predict(entered_input)
print(k)
l=encoder.classes_[k]
L2 = Label(top, text="Prediction: "+l)
L2.pack()

B = Button(top, text ="Predict", command = predict)


B.pack(pady=10)

top.mainloop()
print(k)
The above predict() function will give the required output i.e the type of code
given
Tokenization() :it was used to convert a paragraphs to sentences and
sentences to words,finally words are converted to literals.
sample=tokenize.texts_to_matrix([entered_input]) :this was used to make a
matrix form with the converted words.
RESULTS

As of given the model should process the input and have to predict the type of
programming language that a given input belongs to.

The predicted output will be as shown above.


CONCLUSION

Natural Language Processing (NLP) was useful for the human-machine


interaction
As machine can only understand the binary language NLP convenrts the given
input
Into binary form and sends it to the system, according input the system will
generate the output .
The output was then coverted to user understanble format by the NLP.this
process was very useful and commonly implemented .The conclusion of the
project is to
Determine/predict the exact programming language that a given input code
belongs.
FUTURE PERSPECTIVE

1.Practical implications
– A limited literature is available on the classification of maintenance optimization
models and on its associated case studies. The paper classifies the literature on
maintenance optimization models on different optimization techniques and based on
emerging trends it outlines the directions for future research in the area of
maintenance optimization.
2.information extraction purpose
3.industry monitoring
4.educational perspective

You might also like