INDEX
Explanations
phrases describing positive impact or success
New Auto-Interp
Negative Logits
apache
-0.76
isco
-0.73
apons
-0.72
mares
-0.71
js
-0.71
each
-0.70
exceeds
-0.70
these
-0.68
existed
-0.67
exists
-0.67
POSITIVE LOGITS
easiest
1.15
same
1.13
simplest
1.07
extent
1.07
safest
1.06
biggest
1.05
gist
1.04
toughest
1.01
strongest
1.01
hallmark
1.00
Activations Density 0.136%