INDEX
Explanations
names followed by last names
New Auto-Interp
Negative Logits
.
0.83
,
0.74
ed
0.70
er
0.69
line
0.66
es
0.65
al
0.65
ing
0.62
e
0.62
0.61
POSITIVE LOGITS
Ყ
1.07
Announces
1.04
chyné
1.04
arrerol
1.03
詄
0.99
Ბ
0.96
쮿
0.94
<unused1900>
0.93
avkhat
0.93
Ი
0.93
Activations Density 0.181%