INDEX
Explanations
key-value pairs or identifiers
New Auto-Interp
Negative Logits
ậ
-0.96
Ayrıca
-0.94
groet
-0.91
kaž
-0.91
stát
-0.90
telas
-0.90
Käufer
-0.90
tré
-0.89
komfort
-0.87
digos
-0.85
POSITIVE LOGITS
خودش
0.96
還會
0.91
самому
0.85
€)
0.85
itself
0.82
ferous
0.80
Feste
0.80
zaten
0.79
bosco
0.78
celebre
0.78
Activations Density 0.033%