INDEX
Negative Logits
g
1.00
the
0.98
t
0.96
ing
0.88
two
0.88
w
0.85
ens
0.84
tr
0.84
quela
0.83
ные
0.83
POSITIVE LOGITS
و
0.91
స్
0.88
ْم
0.84
ו
0.81
एकड़
0.80
Schon
0.80
:
0.79
antics
0.75
CASCADE
0.75
vanlig
0.75
Activations Density 0.002%
g
the
t
ing
two
w
ens
tr
quela
ные
و
స్
ْم
ו
एकड़
Schon
:
antics
CASCADE
vanlig