INDEX
Explanations
elements related to conflict and action
New Auto-Interp
Negative Logits
irie
-0.18
hind
-0.16
avou
-0.16
Hind
-0.16
.nano
-0.15
zew
-0.15
VERR
-0.15
393
-0.15
ophon
-0.14
हन
-0.14
POSITIVE LOGITS
olean
0.15
lette
0.15
cx
0.15
äºķ
0.15
withhold
0.14
igit
0.14
lexical
0.14
à¸ĺรรม
0.14
umbled
0.14
벨
0.14
Activations Density 0.459%