INDEX
Negative Logits
ins
-0.07
+y
-0.06
Robinson
-0.06
sons
-0.06
Modified
-0.06
aid
-0.06
irling
-0.06
av
-0.06
oured
-0.05
Norris
-0.05
POSITIVE LOGITS
the
0.12
-the
0.11
THE
0.09
/the
0.09
_THE
0.09
the
0.09
athe
0.09
θε
0.08
-The
0.08
THE
0.08
Activations Density 0.055%