Metaphor Source Recommendation System For Sinhala Songs / Poetry
- Sinhala Language
- Best Sinhala Hit Songs (2000 - 2015)
- Writers who write songs/poetry in the Sinhala language
- Python
- Flask
- Elastic Search / Kibana
- HTML/JS/CSS/JQuery
- Download and Install Elastic Search link
- Download and Install Kibana link
- Install Plugin icu_analysis by running bin\elasticsearch-plugin install analysis-icu
- Run bin\elasticsearch.bat to start Elastic Search
- Run bin\kibana.bat to start Kibana
- Make sure kibana running on http://localhost:5601
- Download this project
- In project root folder create virtual environment by
- Run python -m pip install virtualenv
- Run python -m venv env
- Activate environment by running env\Scripts\activate
- Install requirements by running python -m pip install -r requirements.txt
- Verify installation by python -m pip list
- Run python initialize.py to initialization
- Run python app.py to start flask server
- Now system is running
- Go to http://127.0.0.1:5000, which is the home page of app
- Then type queries which will return results
- You can click link cell and metaphor cell for further exploration.
- Moreover query processing result for each search will be shown in a blue texted box
This will be shown in blue texted box as shown below,
Ex:- රන් කුරහන් මල ගීතය
Ex:- රුක්මන් ගැයූ ගීත
Ex:- සුරඟන සදහා උපමේය
Ex:- අම්මා සදහා උපමා
None
අම්මා සදහා උපමා
Sort By Publish Date Descending
Ex:- අම්මා සදහා නවතම උපමා
If Query Contains Words Like 'ජනප්රිය', 'ප්රචලිත', 'ප්රසිද්ධ', 'ජනප්රියම', 'ප්රචලිතම' 'ප්රචලිතම'
Sort By Popularity Descending
Ex:- අම්මා සදහා ප්රසිද්ධ උපමා
Ex:- ඉරාජ් ගැයු ගීත 10 Songs Will Be Shown
Ex:- ඉරාජ් ගැයු ගීත 3 ක් 3 Songs Will Be Shown
Link : https://github.com/nlpcuom/Sinhala-Stopword-list
Contains 110 Songs. Can be found at,
songs/
song-1.json
song-2.json
...
song-110.json
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"analysis": {
"analyzer": {
"sinhala-tokens-ngram": { # Indexing Time
"type": "custom",
"tokenizer": "icu_tokenizer", # Divide Into Words, ICU_TOKENIZER Best With Asian Languages
"char_filter": ["punctuation_char_replace_filter"], # For Short Texts We Replace Special Characters
"token_filter": [
"edge_n_gram_filter" # Create N-Grams For Each Divided Words
]
},
"sinhala-tokens-words": { # Indexing TIme
"type": "custom",
"tokenizer": "icu_tokenizer", # Divide Into Words, ICU_TOKENIZER Best With Asian Languages
"char_filter": ["punctuation_char_remove_filter"] # For Long Texts We Remove Special Characters
# Don't Do N-Gram Because We Apply This For Long Sentences Like Lyrics. N-Gram Will Be An Overhead
},
"sinhala-search": { # Searching Time
"type": "custom",
"tokenizer": "standard", # Divide Into Words, ICU_TOKENIZER Best With Asian Languages
"char_filter": ["punctuation_char_replace_filter"], # For Short Texts We Replace Special Characters
},
},
"char_filter": {
"punctuation_char_remove_filter": {
"type": "mapping",
"mappings": [".=>", "|=>", "-=>", "_=>", "'=>", "/=>", ",=>"]
},
"punctuation_char_replace_filter": {
"type": "mapping",
"mappings": [".=>\\u0020", "|=>\\u0020", "-=>\\u0020", "_=>\\u0020", "'=>\\u0020", "/=>\\u0020", ",=>\\u0020"]
}
},
"token_filter": {
"edge_n_gram_filter": {
"type": "edge_ngram",
"min_gram": "2",
"max_gram": "10",
"side": "front"
}
}
}
}
"mappings": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
},
"analyzer": "sinhala-tokens-ngram",
"search_analyzer": "sinhala-search",
},
"youtube_link": {
"type": "text"
},
"singers": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
},
"analyzer": "sinhala-tokens-ngram",
"search_analyzer": "sinhala-search"
},
"metaphors": {
"type": "nested",
"properties": {
"meta_id": {
"type": "long"
},
"targets": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
},
"analyzer": "sinhala-tokens-ngram",
"search_analyzer": "sinhala-search"
},
"source": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
},
"analyzer": "sinhala-tokens-ngram",
"search_analyzer": "sinhala-search"
},
"interpretation": {
"type": "completion",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
},
"analyzer": "sinhala-tokens-words",
"search_analyzer": "sinhala-search"
}
}
},
"view_count": {
"type": "long"
},
"published_on": {
"type": "date"
},
"length": {
"type": "long"
},
"lyrics": {
"type": "completion",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
},
"analyzer": "sinhala-tokens-words",
"search_analyzer": "sinhala-search"
}
}
}
Copyright (c) 2022 Mohomed Haris Mohomed Ashkar
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.