INDEX
Explanations
mentions of actions being done with intention
the preposition "with" and its context in sentences
New Auto-Interp
Negative Logits
checkpoint
-0.63
meg
-0.56
elite
-0.56
redist
-0.56
Parish
-0.55
script
-0.54
states
-0.53
competition
-0.53
culture
-0.53
pens
-0.52
POSITIVE LOGITS
with
3.40
without
2.13
along
1.63
from
1.58
upon
1.57
withd
1.50
again
1.49
between
1.45
through
1.42
against
1.41
Activations Density 0.010%