INDEX
Explanations
phrases related to actions or activities being performed by someone
instances of the word "went."
New Auto-Interp
Negative Logits
oint
-0.59
affirmation
-0.58
oret
-0.56
ULTS
-0.56
opia
-0.56
ible
-0.55
ract
-0.55
Camel
-0.55
Monetary
-0.53
representations
-0.53
POSITIVE LOGITS
went
3.10
goes
2.12
went
1.84
flew
1.77
took
1.71
proceeded
1.67
blew
1.66
came
1.64
ran
1.61
stayed
1.61
Activations Density 0.025%