Browsewiki

An information retrieval tool for Wikipedia, the free encyclopedia.

Most tools that are used today to retrieve information from text data, are query based search engines that apply a version of K-Nearest Neighbor search. These tools demand from the user to have a very well defined information need that can be condensed in a few words. This method of information retrieval works similarly to searching the index at the back of a book for a specific word of interest.

Browsewiki approaches the task differently, aiming to enable users to acquire information on a general topic or get an overview of the thematical structure of a text collection similarly to how the contents of a book provide an overview of what it is about. To do so, browsewiki utilizes a browsing method called Scatter/Gather which is based on text document clustering. To speed up the rather slow clustering process, topic modeling is used, in order to reduce the number of dimensions of the document representations.

Installation

Step by step instructions on how to run the demo web application.

First, clone the repo and create a Python virtual environment :

git clone https://github.com/theovasi/browsewiki cd browsewiki python3.6 -m venv venv

activate the virtual environment :

. ./venv/bin/activate

then install the required libraries :

pip install -r requirements.txt

After that, run the setup script. This script will handle pre-processing the text data, creating the necessary vector representations, clustering and topic modeling. Currently Browsewiki supports the English and Greek versions of Wikipedia.

python3 setup.py english

Finally, run the web app :

python3 app.py

License

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
assets		assets
static		static
templates		templates
tests		tests
toolset		toolset
wikicrawlers		wikicrawlers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
forms.py		forms.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Browsewiki

Installation

License

About

Releases

Packages

Languages

License

theovasi/browsewiki

Folders and files

Latest commit

History

Repository files navigation

Browsewiki

Installation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages