INDEX
Explanations
words related to evasive actions or decisions
occurrences of the letter "d" in various contexts
New Auto-Interp
Negative Logits
MENTS
-0.61
men
-0.58
MAX
-0.57
latex
-0.57
feminine
-0.57
MENT
-0.56
female
-0.56
End
-0.55
(>
-0.55
HBO
-0.55
POSITIVE LOGITS
umbered
1.27
owed
1.20
ucked
1.20
umped
1.13
ussed
1.11
inged
1.08
itched
1.07
rolled
1.06
ounced
1.05
uked
1.04
Activations Density 0.068%