INDEX
Explanations
the word "As" at the beginning of sentences or clauses
New Auto-Interp
Negative Logits
that
-0.19
oader
-0.16
ãĤĪãģĨ
-0.16
oyer
-0.15
efon
-0.15
theory
-0.15
icerca
-0.15
overy
-0.15
ously
-0.15
ouchers
-0.15
POSITIVE LOGITS
p
0.32
phalt
0.30
he
0.28
pen
0.27
m
0.27
pire
0.25
st
0.25
i
0.25
king
0.24
semble
0.24
Activations Density 0.062%