INDEX
Explanations
phrases indicating exceptions or notable differences in discussions
New Auto-Interp
Negative Logits
borough
-0.18
pis
-0.15
uld
-0.14
ako
-0.14
ally
-0.14
-at
-0.14
ochen
-0.13
aland
-0.13
lando
-0.13
oš
-0.13
POSITIVE LOGITS
exception
1.21
exceptions
1.13
Exceptions
0.98
exceptions
0.93
except
0.82
exception
0.81
EXCEPTION
0.79
Exceptions
0.77
Exception
0.75
except
0.74
Activations Density 0.178%