INDEX
Explanations
negative sentiment
The neuron detects German‐language words or word pieces.
New Auto-Interp
Negative Logits
February
-0.06
Ye
-0.06
AGON
-0.06
seront
-0.06
ById
-0.06
boz
-0.06
STOP
-0.06
tes
-0.06
〇
-0.06
dit
-0.06
POSITIVE LOGITS
tainted
0.07
°}
0.07
_OPENGL
0.07
OUTER
0.07
.xhtml
0.07
inferior
0.07
يلم
0.06
japan
0.06
uds
0.06
mas
0.06
Activations Density 0.009%