You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For some datasets, or fields maybe, outlier detection is taking forever to run. Attempting to run it on the ACLED dataset resulted in it running for over 12 hours and still not completing. When killing the process, it was stuck in the sklearn imputer "d3m.primitives.data_cleaning.imputer.SKlearn".
The text was updated successfully, but these errors were encountered:
The root cause of this issue is the explosion in features due to text encoding. The client currently uses the first variable as the target variable rather than the selected target variable. For acled, that happens to be the data id (unique for each row) so the text encoder ends up creating nearly 16k features.
At the very least, the client needs to be updated to use the right target. Ideally, some boundary checking or some type of sanity check would exist on the server to make sure outlier detection only runs in cases that make sense or somehow limit the feature explosion that can occur.
For some datasets, or fields maybe, outlier detection is taking forever to run. Attempting to run it on the ACLED dataset resulted in it running for over 12 hours and still not completing. When killing the process, it was stuck in the sklearn imputer
"d3m.primitives.data_cleaning.imputer.SKlearn"
.The text was updated successfully, but these errors were encountered: