- Web Crawler
- Links Extraction
- Page Rank
- TF-IDF
- N-Gram
- Top K
- Inverted Index
- Recommender System
- Sentiment Analysis
- Front page
- Spelling Correction
- Language Identifier
- Auto Completion
- Snippet
- A well-functional search engine, including Web Crawler, Spelling Correction, Inverted index, PageRank Algorithm, TF-IDF Algorithm, AutoComplete, Recommender System and Sentiment Analysis
- Web Crawler: Implemented a multithreading web crawler based on crawler4j
- Page Rank: Extracted out-links from webpages collected by web crawler, built adjacent matrix from hyperlinks of each page, calculated PageRank based on page relation
- TF-IDF: Parsed HTML pages, extracted content text and computed TF-IDF
- N-Gram: generated language model, built real-time AutoCompletion based on N-Gram statistics
- Recommender System: built video rating matrix from dataset, calculated video co-occurrence matrix, based on Item Collaborative Filtering algorithm
- Sentiment Analysis: Extracted emotion feature from text and implemented sentiment analysis based on emotion dictionary
- Implemented Top K algorithm and Inverted Index, increased the query efficiency
- Implemented Spelling Correction
- UI: built front pages with PHP, Bootstrap and jQuery
enter a query
the result will show up with docID, title, url, description and snippets
AutoCompletion: give user query suggestions
Spelling Correction: When I mis-typing California as californa, it will ask "Are you looking for California"
We can click on the spelling correction hint. It will help us redirect to the correct word.