INDEX
Explanations
references to official statements or documentation processes
New Auto-Interp
Negative Logits
ãĥ©ãĥĥãĤ¯
-0.19
apollo
-0.16
allah
-0.15
éŁ
-0.15
mand
-0.15
arters
-0.15
annon
-0.14
sobie
-0.14
AFF
-0.14
Neutral
-0.13
POSITIVE LOGITS
iyan
0.16
egl
0.14
istine
0.14
Latter
0.14
ioni
0.14
Ħ
0.14
FRIEND
0.14
@student
0.13
pNet
0.13
Cave
0.13
Activations Density 0.056%