INDEX
Explanations
specific mentions of the word "intention."
terms related to intention and evaluation
New Auto-Interp
Negative Logits
straw
-0.70
scl
-0.64
creen
-0.62
mind
-0.61
tipping
-0.59
Springer
-0.58
smugglers
-0.57
jay
-0.56
cart
-0.56
cart
-0.56
POSITIVE LOGITS
ally
1.93
ality
1.77
als
1.51
alities
1.51
arily
1.49
alist
1.47
ary
1.43
aries
1.36
nel
1.31
al
1.23
Activations Density 0.254%