queries

codes for performing queries from word2vec embeddings #to perform queries on a certain percentage of data and retrieve the top n sentences(topSentences) in cosine similarity with the query sentence:

cd old_home/ana/destress/word2vec
run /code/BIDMach/bidmach run_privacy.ssc
perform query by calling query("query sentence",topSentences,"filter",minWord,maxWord), default for minWord and maxWord are 5 and 40
the system will ask for the percentage of data to perform query upon, insert a number between 0 to 1.0 or press enter to perform on 100% of data
information of the query will be printed on the console in the order of: query sentence, filter, topSentences, minWords, maxWords, number of file and system time. Those information will be delimited by tab and saved in ./foodtype/queryhistory_compare.txt in the same order(appending to previous history). To change file directory, chage fw0.
After search is performed, the topmost results will be printed to the console, in the order of cosine similarity score, the sentence retrieved and the URL for the homepage of the blogger who wrote the sentence
The results will be saved in ./foodtype/queryresult_compare.txt(will overwrite previous history), saved information include similarity score, the sentence retrieved, sentence in its surrounding context(1 sentence before and after if there's any) and the URL for the homepage of the blogger who wrote the sentence, first row would be header

#to perform kmeans clustering:

cd old_home/ana/destress/word2vec
run /code/BIDMach/bidmach run_kmeans.ssc
perform query by calling query("query sentence",topSentences,"filter",minWord,maxWord), default for minWord and maxWord are 5 and 40 WARNING: topSentences go over 1000 will lead to java heap out of memory problem, will fix it later
the system will ask for the percentage of data to perform query upon, insert a number between 0 to 1.0 or press enter to perform on 100% of data
information of the query will be printed on the console in the order of: query sentence, filter, topSentences, minWords, maxWords, number of file and system time. Those information will be delimited by tab and saved in ./foodtype/queryhistory_kmeans.txt in the same order(appending to previous history). To change file directory, chage fw0.
After search is performed, the topmost results will be printed to the console, in the order of cosine similarity score, the sentence retrieved and the URL for the homepage of the blogger who wrote the sentence
The results will be saved in ./foodtype/queryresult_kmeans.txt(will overwrite previous history), saved information include similarity score, the sentence retrieved and the URL for the homepage of the blogger who wrote the sentence, first row would be header
vector of sentences(currently normalized by length) will be saved in ./foodtype/vecs_full_top.fmat.lz4
quit scala by :q(has to run on 2 versions of bidmach for now because /code/BIDMach/bidmach support distributed file reading while ../../BIDMach/bidmach support hdf5 file saving)
run ../../BIDMach/bidmach query_kmeans_v2.scala for kmeans clustering, group number for sentences retrieved in step 1-9 would be saved as ./foodtype/labelsInCluster.mat, centroids are saved in ./foodtype/clusterVec.mat and sentences' vectors are saved in ./foodtype/sentVec.mat

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Histogram of User-generated Emotions		Histogram of User-generated Emotions
backend-server-python		backend-server-python
frontend		frontend
inquire-meteor-master		inquire-meteor-master
latest-inquire-frontend		latest-inquire-frontend
process_data		process_data
viz-notebooks		viz-notebooks
README.md		README.md
bidsFairePoster_new .ppt		bidsFairePoster_new .ppt
emotionTree.html		emotionTree.html
full_query_context_v2.scala		full_query_context_v2.scala
full_savevec_v2.scala		full_savevec_v2.scala
query_kmeans_v2.scala		query_kmeans_v2.scala
run_kmeans.ssc		run_kmeans.ssc
run_privacy.ssc		run_privacy.ssc
sample_server_request_json.txt		sample_server_request_json.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

queries

About

Releases

Packages

Contributors 7

Languages

peparedes/inquire

Folders and files

Latest commit

History

Repository files navigation

queries

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages