INDEX
Explanations
instances of the word "in" indicating locations or contexts within the text
New Auto-Interp
Negative Logits
oref
-0.15
ched
-0.15
nero
-0.15
forth
-0.15
roj
-0.15
ovation
-0.14
heart
-0.14
/of
-0.14
hart
-0.14
ires
-0.14
POSITIVE LOGITS
676
0.16
729
0.15
677
0.15
679
0.15
ÅŁa
0.15
lä
0.15
sbin
0.15
early
0.14
791
0.14
697
0.14
Activations Density 0.073%