INDEX
Explanations
words related to physical actions, future events, and potential consequences
New Auto-Interp
Negative Logits
ALWAYS
-0.68
Everyday
-0.65
#$
-0.62
!",
-0.62
iuses
-0.62
{{-0.61
emale
-0.60
@#
-0.60
EVERY
-0.60
cellent
-0.60
POSITIVE LOGITS
someday
1.28
sooner
1.21
if
1.04
sometime
1.03
anytime
1.00
elsewhere
1.00
altogether
0.95
soon
0.80
nonetheless
0.78
next
0.78
Activations Density 0.785%