INDEX
Explanations
proper nouns and specific names
New Auto-Interp
Negative Logits
arme
-0.15
aze
-0.15
esktop
-0.14
azes
-0.14
ackages
-0.14
¿ł
-0.14
acent
-0.14
aces
-0.14
outer
-0.14
iard
-0.14
POSITIVE LOGITS
swe
0.18
andard
0.17
hti
0.16
hen
0.15
bjerg
0.15
idan
0.15
hed
0.15
олÑĸ
0.15
Swe
0.15
andra
0.15
Activations Density 0.019%