INDEX
Negative Logits
who
-0.07
how
-0.07
also
-0.07
quirrel
-0.07
NR
-0.07
GER
-0.07
intermitt
-0.07
Mrs
-0.06
burnt
-0.06
rather
-0.06
POSITIVE LOGITS
With
0.14
With
0.13
"With
0.10
with
0.08
.With
0.07
WITH
0.06
with
0.06
—with
0.06
vessel
0.06
spanish
0.06
Activations Density 0.027%