INDEX
Explanations
phrases indicating commitment and support
New Auto-Interp
Negative Logits
yn
-0.19
kind
-0.17
IVITY
-0.17
quite
-0.15
kind
-0.15
brtc
-0.15
Ñħод
-0.14
Quite
-0.14
inement
-0.14
_kind
-0.14
POSITIVE LOGITS
sound
0.17
exus
0.17
friction
0.16
robust
0.16
rob
0.15
commit
0.15
suite
0.15
tera
0.15
achts
0.15
997
0.14
Activations Density 0.228%