Skip to content

Commit

Permalink
Lab 3 - fix doctoring for 2c, fix variable name for 3c
Browse files Browse the repository at this point in the history
  • Loading branch information
felixcheung committed Jun 20, 2015
1 parent a0a9cc5 commit 457ede8
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions lab3_text_analysis_and_entity_resolution_student.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"version 1.0.2\n",
"version 1.0.3\n",
"#![Spark Logo](http://spark-mooc.github.io/web-assets/images/ta_Spark-logo-small.png) + ![Python Logo](http://spark-mooc.github.io/web-assets/images/python-logo-master-v3-TM-flattened_small.png)\n",
"# **Text Analysis and Entity Resolution**\n",
"####Entity resolution is a common, yet difficult problem in data cleaning and integration. This lab will demonstrate how we can use Apache Spark to apply powerful and scalable text analysis techniques and perform entity resolution across two datasets of commercial products."
Expand Down Expand Up @@ -523,7 +523,7 @@
" Args:\n",
" corpus (RDD): input corpus\n",
" Returns:\n",
" RDD: a RDD of (record ID, IDF value)\n",
" RDD: a RDD of (token, IDF value)\n",
" \"\"\"\n",
" N = <FILL IN>\n",
" uniqueTokens = corpus.<FILL IN>\n",
Expand Down Expand Up @@ -837,7 +837,7 @@
" amazonID = <FILL IN>\n",
" googleValue = <FILL IN>\n",
" amazonValue = <FILL IN>\n",
" cs = cosineSimilarity(<FILL IN>, idfs_small_weights)\n",
" cs = cosineSimilarity(<FILL IN>, idfsSmallWeights)\n",
" return (googleURL, amazonID, cs)\n",
"\n",
"similarities = (crossSmall\n",
Expand Down

0 comments on commit 457ede8

Please sign in to comment.