Skip to content

swetharam/implementation-of-Ngrams

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Homework1: Develop a bigram probability of two sentences given via command line
For running this code , you will have to give to the following commands in the command line:

python  homework1.py Corpus.txt �Sentence 1� �Sentence 2�

NOTE:
Put the homework1.py and the Corpus.txt in the same folder or give the relative path for both
As per the rules, the string values should be given in double quotes and this both sentences must be given in double quotes and should be separated by a space
I have used add one smoothing as the smoothing technique.


SAMPLE OUTPUT:
For sentence 1: Before Smoothing
The following values are for bigrams of the first sentence:


[[ 0  6  7  1  0  0  0  1  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 0  0  0  5  0  0  0  5  0]
 [23  0  0  0 28  1  0  0 43]
 [ 2  0  0  0  0  1  1  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [17  0  0 85  0  0  0 85  2]
 [23  0  0  0 28  1  0  0 43]
 [ 0  0  0  0  0  0  0  0  0]]




The following values are the bigram probabilities of the first sentence:


[[ 0.          0.01327434  0.01548673  0.00221239  0.          0.          0.
   0.00221239  0.        ]
 [ 0.          0.          0.          0.          0.          0.          0.
   0.          0.        ]
 [ 0.          0.          0.          0.09259259  0.          0.          0.
   0.09259259  0.        ]
 [ 0.03050398  0.          0.          0.          0.03713528  0.00132626
   0.          0.          0.05702918]
 [ 0.03921569  0.          0.          0.          0.          0.01960784
   0.01960784  0.          0.        ]
 [ 0.          0.          0.          0.          0.          0.          0.
   0.          0.        ]
 [ 0.04545455  0.          0.          0.22727273  0.          0.          0.
   0.22727273  0.00534759]
 [ 0.03050398  0.          0.          0.          0.03713528  0.00132626
   0.          0.          0.05702918]
 [ 0.          0.          0.          0.          0.          0.          0.
   0.          0.        ]]
Total Probability before smoothing is 0.446911955519
For sentence 1: After Smoothing
The following values are for bigrams of the first sentence after performing smoothing on them:


[[ 1  7  8  2  1  1  1  2  1]
 [ 1  1  1  1  1  1  1  1  1]
 [ 1  1  1  6  1  1  1  6  1]
 [24  1  1  1 29  2  1  1 44]
 [ 3  1  1  1  1  2  2  1  1]
 [ 1  1  1  1  1  1  1  1  1]
 [18  1  1 86  1  1  1 86  3]
 [24  1  1  1 29  2  1  1 44]
 [ 1  1  1  1  1  1  1  1  1]]


The probabilities after performing smoothing on the data on sentence 1:


[[ 0.00021725  0.00152075  0.001738    0.0004345   0.00021725  0.00021725
   0.00021725  0.0004345   0.00021725]
 [ 0.00024044  0.00024044  0.00024044  0.00024044  0.00024044  0.00024044
   0.00024044  0.00024044  0.00024044]
 [ 0.00023781  0.00023781  0.00023781  0.00142687  0.00023781  0.00023781
   0.00023781  0.00142687  0.00023781]
 [ 0.00489297  0.00020387  0.00020387  0.00020387  0.00591233  0.00040775
   0.00020387  0.00020387  0.00897044]
 [ 0.00071395  0.00023798  0.00023798  0.00023798  0.00023798  0.00047596
   0.00047596  0.00023798  0.00023798]
 [ 0.00023912  0.00023912  0.00023912  0.00023912  0.00023912  0.00023912
   0.00023912  0.00023912  0.00023912]
 [ 0.0039779   0.00022099  0.00022099  0.01900552  0.00022099  0.00022099
   0.00022099  0.01900552  0.00066298]
 [ 0.00489297  0.00020387  0.00020387  0.00020387  0.00591233  0.00040775
   0.00020387  0.00020387  0.00897044]
 [ 0.00023646  0.00023646  0.00023646  0.00023646  0.00023646  0.00023646
   0.00023646  0.00023646  0.00023646]]


Total Probability after smoothing is 0.0377914439311


For sentence 2: Before Smoothing
The following values are for bigrams of the second sentence before smoothing:


[[ 0  6  7  1  0  0  0  1  0]
 [ 0  0  0  0  0  0  0  0  0]
 [ 0  0  0  5  0  0  0  5  0]
 [23  0  0  0 28  1  0  0 43]
 [ 2  0  0  0  0  1  1  0  0]
 [ 0  0  0  0  0  0  0  0  0]
 [17  0  0 85  0  0  0 85  2]
 [23  0  0  0 28  1  0  0 43]
 [ 0  0  0  0  0  0  0  0  0]]




The following values are the bigram probabilities of the second sentence:


[[ 0.          0.01327434  0.01548673  0.00221239  0.          0.          0.
   0.00221239  0.        ]
 [ 0.          0.          0.          0.          0.          0.          0.
   0.          0.        ]
 [ 0.          0.          0.          0.0066313   0.          0.          0.
   0.0066313   0.        ]
 [ 0.47916667  0.          0.          0.          0.58333333  0.02083333
   0.          0.          0.89583333]
 [ 0.2         0.          0.          0.          0.          0.1         0.1
   0.          0.        ]
 [ 0.          0.          0.          0.          0.          0.          0.
   0.          0.        ]
 [ 0.27419355  0.          0.          1.37096774  0.          0.          0.
   1.37096774  0.03225806]
 [ 0.0777027   0.          0.          0.          0.09459459  0.00337838
   0.          0.          0.14527027]
 [ 0.          0.          0.          0.          0.          0.          0.
   0.          0.        ]]
Total Probability before smoothing is 2.21947698156


For sentence 2: After Smoothing
The following values are for bigrams of the second sentence after performing smoothing on them:


[[ 1  7  8  2  1  1  1  2  1]
 [ 1  1  1  1  1  1  1  1  1]
 [ 1  1  1  6  1  1  1  6  1]
 [24  1  1  1 29  2  1  1 44]
 [ 3  1  1  1  1  2  2  1  1]
 [ 1  1  1  1  1  1  1  1  1]
 [18  1  1 86  1  1  1 86  3]
 [24  1  1  1 29  2  1  1 44]
 [ 1  1  1  1  1  1  1  1  1]]


The probabilities after performing smoothing on the sentence 2:


[[ 0.00021725  0.00152075  0.001738    0.0004345   0.00021725  0.00021725
   0.00021725  0.0004345   0.00021725]
 [ 0.00023866  0.00023866  0.00023866  0.00023866  0.00023866  0.00023866
   0.00023866  0.00023866  0.00023866]
 [ 0.00020387  0.00020387  0.00020387  0.00122324  0.00020387  0.00020387
   0.00020387  0.00122324  0.00020387]
 [ 0.00571565  0.00023815  0.00023815  0.00023815  0.00690641  0.0004763
   0.00023815  0.00023815  0.01047869]
 [ 0.00072098  0.00024033  0.00024033  0.00024033  0.00024033  0.00048065
   0.00048065  0.00024033  0.00024033]
 [ 0.00022099  0.00022099  0.00022099  0.00022099  0.00022099  0.00022099
   0.00022099  0.00022099  0.00022099]
 [ 0.00427249  0.00023736  0.00023736  0.02041301  0.00023736  0.00023736
   0.00023736  0.02041301  0.00071208]
 [ 0.0053969   0.00022487  0.00022487  0.00022487  0.00652125  0.00044974
   0.00022487  0.00022487  0.00989431]
 [ 0.00024027  0.00024027  0.00024027  0.00024027  0.00024027  0.00024027
   0.00024027  0.00024027  0.00024027]]
Total Probability after smoothing is 0.0408980249942





About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages