INDEX
Explanations
conditional phrases that suggest hesitance or hypothetical scenarios
New Auto-Interp
Negative Logits
him
-0.14
lui
-0.14
eux
-0.14
apon
-0.14
herself
-0.14
Probably
-0.14
ovky
-0.14
them
-0.14
tieten
-0.14
annies
-0.13
POSITIVE LOGITS
they
0.35
rames
0.33
there
0.33
indeed
0.32
anything
0.30
fy
0.30
you
0.29
it
0.29
we
0.29
nothing
0.28
Activations Density 0.197%