INDEX
Negative Logits
l
1.07
s
1.05
n
1.00
r
0.99
ก
0.99
t
0.96
ร
0.87
크
0.87
৪
0.86
ఇ
0.85
POSITIVE LOGITS
worthy
1.25
to
1.23
ة
1.05
are
1.02
by
0.99
hana
0.92
unworthy
0.92
ches
0.88
worthy
0.86
amente
0.83
Activations Density 0.001%
l
s
n
r
ก
t
ร
크
৪
ఇ
worthy
to
ة
are
by
hana
unworthy
ches
worthy
amente