Skip to content

Commit

Permalink
env.yml and stopwords
Browse files Browse the repository at this point in the history
  • Loading branch information
Hobson Lane committed Jul 17, 2021
1 parent 43da358 commit c3b9f12
Show file tree
Hide file tree
Showing 10 changed files with 1,529 additions and 32 deletions.
2 changes: 1 addition & 1 deletion conda/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ dependencies:
- pip>=18.0
- plac==0.9.6
- plotly==3.3.0
- pyscaffold==3.1
- pyscaffold>=4.0
- pytest-cov
- python-levenshtein==0.12.0
- redis-py==2.10.6
Expand Down
28 changes: 28 additions & 0 deletions src/nlpia/book/examples/ch06_glove_nessvectors.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
""" NLPIA chapter 6 glove nessvectors
Dependencies:
* python==3.6.12
* scann==
References:
* Stanford NLP's GloVe model and training script [https://github.com/stanfordnlp/glove]
* Erik Bern's ANN benchmarks with training and testsets: https://github.com/erikbern/ann-benchmarks
* Spotify's Annoy (with good visualization): [https://github.com/spotify/annoy]
* Google Research's ScaNN: [pip install scann]()
"""


import np


def load_glove(filepath):
# print("Loading Glove Model")
f = open(filepath, 'r')
wv = {}
for line in f:
splitLines = line.split()
word = splitLines[0]
embedding = np.array([float(value) for value in splitLines[1:]])
wv[word] = embedding
# print(len(wv), " words loaded!")
return wv
5 changes: 4 additions & 1 deletion src/nlpia/book/examples/ch07.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@
'https://www.dropbox.com/s/965dir4dje0hfi4/GoogleNews-vectors-negative300.bin.gz?dl=1',
1647046227,
),
'glove': (
'http://nlp.stanford.edu/data/glove.6B.zip',
862182613,
),
'slang': (
'https://www.dropbox.com/s/43c22018fbfzypd/slang.csv.gz?dl=1',
117633024,
Expand Down Expand Up @@ -331,4 +335,3 @@ def pad_trunc(data, maxlen):


model.predict_classes(test_vec)

4 changes: 2 additions & 2 deletions src/nlpia/book/examples/ch09.py
Original file line number Diff line number Diff line change
Expand Up @@ -625,8 +625,8 @@ def onehot_encode(dataset, char_indices, maxlen):


print('Vectorization...')
X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
X = np.zeros((len(sentences), maxlen, len(chars)), dtype=bool)
y = np.zeros((len(sentences), len(chars)), dtype=bool)
for i, sentence in enumerate(sentences):
for t, char in enumerate(sentence):
X[i, t, char_indices[char]] = 1
Expand Down
4 changes: 2 additions & 2 deletions src/nlpia/book_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,13 @@ def get_lines(file_path=BOOK_PATH):
>>> lines = get_lines(BOOK_PATH)
>>> next(lines)
('.../src/nlpia/data/book/Appendix F -- Glossary.asc',
('.../src/nlpia/data/book/Appendix F -- Glossary.adoc',
['= Glossary\n',
'\n',
"We've collected some ...])
"""
if os.path.isdir(file_path):
file_path = os.path.join(file_path, '*.asc')
file_path = os.path.join(file_path, '*.adoc')
files = glob.glob(file_path)
elif os.path.isfile(file_path):
files = [file_path]
Expand Down
5 changes: 2 additions & 3 deletions src/nlpia/data/inspirational_quotes.csv
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,8 @@
"The entire exhibitions industry in the United States of America has filed for bankruptcy.","Jeffrey Katzenberg",0.9,1,1,1
"And as soon as I did the research, I realized the law seemed to be on my side and I filed the suit.","Michael Newdow",0.9,1,1,1
"No charges have been filed by the L.A. district attorney's office, and for that I am appreciative. I have said it before, but we all make mistakes, and the day will come soon enough where you no longer read of mine in the tabloids.","Scott Stapp",0.9,1,1,1
"With me serving as the president, we filed a $3-million lawsuit against the league and its member clubs in an attempt to win increased pension benefits and a larger share Of television revenue.","Ted Lindsay",0.9,1,1,1
"With me serving as the president, we filed a $3-million lawsuit against the league and its member clubs in an attempt to win increased pension benefits and a larger share of television revenue.","Ted Lindsay",0.9,1,1,1
"I filed the ethics complaint against Tom DeLay not because I'm a Democrat and he's a Republican or even because he drew me out of my congressional seat but because he engaged in corruption to further his plans to disenfranchise voters in Texas.","Chris Bell",0.9,1,1,1
"It's not clutter, it's my unique filing system.","Jonathan Lockwood Huie",0.9,1,1,1
"Labels are for filing. Labels are for clothing. Labels are not for people.","Martina Navratilova",0.9,1,1,1
"Is not the whole world a vast house of assignation of which the filing system has been lost?","Quentin Crisp
",0.9,1,1,1
"Is not the whole world a vast house of assignation of which the filing system has been lost?","Quentin Crisp",0.9,1,1,1
Loading

0 comments on commit c3b9f12

Please sign in to comment.