INDEX
Explanations
anomalous or unusual situations and questions
New Auto-Interp
Negative Logits
ja
-0.17
LAG
-0.16
álo
-0.15
лада
-0.15
ظÙĩ
-0.15
æģIJ
-0.14
ظ
-0.14
lia
-0.14
AGON
-0.13
ami
-0.13
POSITIVE LOGITS
nowhere
0.18
_None
0.17
none
0.16
None
0.16
wald
0.15
neither
0.14
chl
0.14
nobody
0.14
ousse
0.14
isz
0.14
Activations Density 0.093%