You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why doing this?
As stated and repeated, the interset process performs a Mapreduction in-memory with data from the data packages (quick), but then, performs another Mapreduction with records already present in the Prosim-db which were created during previous sliced imports. This latter Mapreduction is a very heavy process to deal with for the MongoDb (performed for each record in Memory but still, a lot of read, write, expand, indexing staff in the db).
Hence it could be interesting to determine a maximum amount of products for which a 1-shot integration could be performed (only in-memory Mapreductions). Thus would let us gain a considerable amount of time (no scheduled tasks 2 an hour anymore) and possibly could the Prosim-db be generated from scratch in less than a day instead of several days.
How to proceed?
Number of products with appropriate non-empty tags for making the comparison between products is limited to about 20% of the OFF official db:
about 110.000 / 550.000 products
Check what happens in terms of resources used (memory, disk speed/space, overall behaviour) if we decide to create the Prosim-db in 1 shot by setting the environment as follows:
feeder_1 has extracted all 110.000 meeting non empty criteria for "nutrition_score_uk" and "categories_tags" => all_products.json
copy all_products.json into updated_products.json
in preparer/config.xml, set tags with these values:
<width>120000</width>
<height>120000</height>
<stats_H_nb_products>nb products extracted in all_products.json</stats_H_nb_products>
<stats_W_nb_products>nb products extracted in all_products.json</stats_W_nb_products>
preparer/progress.xml: clear values of the tags to start with a new Prosim-db
intersect/config.xml : set max db size to 500GB
<max_db_size_gigabytes>500</max_db_size_gigabytes>
The text was updated successfully, but these errors were encountered:
Requirements: Issue #1 implemented
Why doing this?
As stated and repeated, the interset process performs a Mapreduction in-memory with data from the data packages (quick), but then, performs another Mapreduction with records already present in the Prosim-db which were created during previous sliced imports. This latter Mapreduction is a very heavy process to deal with for the MongoDb (performed for each record in Memory but still, a lot of read, write, expand, indexing staff in the db).
Hence it could be interesting to determine a maximum amount of products for which a 1-shot integration could be performed (only in-memory Mapreductions). Thus would let us gain a considerable amount of time (no scheduled tasks 2 an hour anymore) and possibly could the Prosim-db be generated from scratch in less than a day instead of several days.
How to proceed?
Number of products with appropriate non-empty tags for making the comparison between products is limited to about 20% of the OFF official db:
about 110.000 / 550.000 products
Check what happens in terms of resources used (memory, disk speed/space, overall behaviour) if we decide to create the Prosim-db in 1 shot by setting the environment as follows:
<width>120000</width>
<height>120000</height>
<stats_H_nb_products>nb products extracted in all_products.json</stats_H_nb_products>
<stats_W_nb_products>nb products extracted in all_products.json</stats_W_nb_products>
<max_db_size_gigabytes>500</max_db_size_gigabytes>
The text was updated successfully, but these errors were encountered: