INDEX
Negative Logits
741
-0.20
et
-0.18
i
-0.17
841
-0.15
ways
-0.15
k
-0.15
an
-0.15
776
-0.15
er
-0.14
ing
-0.14
POSITIVE LOGITS
atty
0.26
auf
0.25
ards
0.24
atrix
0.24
avers
0.23
asley
0.22
jamin
0.21
heading
0.21
arded
0.21
adle
0.20
Activations Density 0.011%