INDEX
Explanations
phrases related to the condition or quality of objects
New Auto-Interp
Negative Logits
voks
-0.16
atti
-0.15
umph
-0.15
_BE
-0.14
acts
-0.14
rog
-0.14
ulence
-0.14
cht
-0.13
esty
-0.13
unsuccessful
-0.13
POSITIVE LOGITS
condition
0.73
Condition
0.63
condition
0.57
Condition
0.54
CONDITION
0.52
-condition
0.43
_condition
0.42
.condition
0.41
CONDITION
0.41
(condition
0.41
Activations Density 0.084%