INDEX
Explanations
special characters and symbols
the presence of special character tokens or formatting
New Auto-Interp
Negative Logits
ciating
-1.12
swick
-0.94
illac
-0.85
sterdam
-0.83
matically
-0.82
matical
-0.74
brates
-0.74
teness
-0.70
frey
-0.70
ependence
-0.68
POSITIVE LOGITS
oti
0.98
uri
0.85
orter
0.84
abba
0.81
α
0.81
uler
0.80
Å«
0.76
uli
0.76
orts
0.75
Exit
0.74
Activations Density 0.028%