INDEX
Explanations
instances of the word "nothing."
New Auto-Interp
Negative Logits
alles
-0.15
enu
-0.14
438
-0.14
aren
-0.14
ØŃØ©
-0.14
üven
-0.14
ial
-0.13
serter
-0.13
posit
-0.13
917
-0.13
POSITIVE LOGITS
else
0.35
ness
0.28
else
0.25
ELSE
0.23
Else
0.22
burger
0.21
_else
0.21
Else
0.21
wrong
0.21
/no
0.20
Activations Density 0.048%