INDEX
Explanations
mentions of failure or underperformance
instances of the word "failing" or its variations
New Auto-Interp
Negative Logits
atar
-0.74
auts
-0.67
entle
-0.65
eous
-0.64
abb
-0.63
Works
-0.63
arom
-0.62
atri
-0.62
Hyd
-0.60
arf
-0.59
POSITIVE LOGITS
failing
3.62
failure
2.08
failed
1.83
fail
1.83
failed
1.81
Failure
1.79
failures
1.78
Failure
1.71
fail
1.69
fails
1.66
Activations Density 0.015%