try creation of prosim-db in one single shot #6

oricdev · 2018-07-17T19:23:02Z

Requirements: Issue #1 implemented

Why doing this?
As stated and repeated, the interset process performs a Mapreduction in-memory with data from the data packages (quick), but then, performs another Mapreduction with records already present in the Prosim-db which were created during previous sliced imports. This latter Mapreduction is a very heavy process to deal with for the MongoDb (performed for each record in Memory but still, a lot of read, write, expand, indexing staff in the db).
Hence it could be interesting to determine a maximum amount of products for which a 1-shot integration could be performed (only in-memory Mapreductions). Thus would let us gain a considerable amount of time (no scheduled tasks 2 an hour anymore) and possibly could the Prosim-db be generated from scratch in less than a day instead of several days.

How to proceed?
Number of products with appropriate non-empty tags for making the comparison between products is limited to about 20% of the OFF official db:
about 110.000 / 550.000 products
Check what happens in terms of resources used (memory, disk speed/space, overall behaviour) if we decide to create the Prosim-db in 1 shot by setting the environment as follows:

feeder_1 has extracted all 110.000 meeting non empty criteria for "nutrition_score_uk" and "categories_tags" => all_products.json
copy all_products.json into updated_products.json
in preparer/config.xml, set tags with these values:
<width>120000</width>
<height>120000</height>
<stats_H_nb_products>nb products extracted in all_products.json</stats_H_nb_products>
<stats_W_nb_products>nb products extracted in all_products.json</stats_W_nb_products>
preparer/progress.xml: clear values of the tags to start with a new Prosim-db
intersect/config.xml : set max db size to 500GB
<max_db_size_gigabytes>500</max_db_size_gigabytes>

oricdev added easy easy to deal with help wanted Extra attention is needed prio 2 middle priority labels Jul 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

try creation of prosim-db in one single shot #6

try creation of prosim-db in one single shot #6

oricdev commented Jul 17, 2018 •

edited

Loading

try creation of prosim-db in one single shot #6

try creation of prosim-db in one single shot #6

Comments

oricdev commented Jul 17, 2018 • edited Loading

oricdev commented Jul 17, 2018 •

edited

Loading