You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 9, 2025. It is now read-only.
All pages including analysis/common/functions.php run get_all_datasets() on load. This function runs a number of queries to retrieve statistics about all datasets in TCAT, notably the number of tweets (COUNT(t.id)).
For very large query bins (50 million+ tweets), a COUNT(*) query can take quite a while to complete (over a minute in some worst-case scenarios I've seen), greatly slowing down the page. This would not be a huge issue if the query could be cached, but for active bins the count changes on every scrape, so the result of this query cannot be cached properly. This makes the analysis page very hard to use since virtually everything is greatly slowed down.
In this scenario it would be better to cache the number of tweets in another way. Since the content of the tweets table is only manipulated in a limited number of places in the code, it would be possible to store this statistic (and possibly others) in another table or as a column in tcat_query_bins. It could then be increased or decreased as needed. This would require a small amount of extra processing while inserting/deleting tweets but would make for a much improved user experience. This could also be done automatically with MySQL triggers.
Alternatively, one could use the approximate row count returned by e.g. SHOW TABLE STATUS, but this is inaccurate and may give misleading results.
The text was updated successfully, but these errors were encountered:
All pages including
analysis/common/functions.php
runget_all_datasets()
on load. This function runs a number of queries to retrieve statistics about all datasets in TCAT, notably the number of tweets (COUNT(t.id)
).For very large query bins (50 million+ tweets), a
COUNT(*)
query can take quite a while to complete (over a minute in some worst-case scenarios I've seen), greatly slowing down the page. This would not be a huge issue if the query could be cached, but for active bins the count changes on every scrape, so the result of this query cannot be cached properly. This makes the analysis page very hard to use since virtually everything is greatly slowed down.In this scenario it would be better to cache the number of tweets in another way. Since the content of the tweets table is only manipulated in a limited number of places in the code, it would be possible to store this statistic (and possibly others) in another table or as a column in
tcat_query_bins
. It could then be increased or decreased as needed. This would require a small amount of extra processing while inserting/deleting tweets but would make for a much improved user experience. This could also be done automatically with MySQL triggers.Alternatively, one could use the approximate row count returned by e.g.
SHOW TABLE STATUS
, but this is inaccurate and may give misleading results.The text was updated successfully, but these errors were encountered: