INDEX
Explanations
phrases expressing expectations or beliefs connected to actions and behaviors
New Auto-Interp
Negative Logits
uya
-0.16
Leban
-0.16
igel
-0.15
762
-0.15
childs
-0.14
abis
-0.14
erus
-0.14
деле
-0.13
stri
-0.13
Independence
-0.13
POSITIVE LOGITS
udit
0.15
appa
0.15
obuf
0.14
awah
0.14
nee
0.14
entes
0.14
umen
0.14
elop
0.14
Certain
0.14
гоÑĤ
0.14
Activations Density 0.088%