INDEX
Negative Logits
A
-0.70
-0.68
(
-0.67
in
-0.66
it
-0.64
if
-0.64
let
-0.63
we
-0.61
I
-0.60
des
-0.59
POSITIVE LOGITS
myſelf
1.41
ſelf
1.38
متعلقه
1.34
ſelves
1.33
purpoſe
1.31
leſs
1.29
Efq
1.27
auffi
1.27
ſta
1.27
AndEndTag
1.26
Activations Density 0.030%