INDEX
Explanations
phrases related to timeframes and events in the future
future predictions or expectations
New Auto-Interp
Negative Logits
complying
-0.78
quitting
-0.75
donating
-0.70
Osw
-0.66
withholding
-0.65
Friends
-0.65
Anon
-0.63
repairing
-0.62
unethical
-0.61
resisting
-0.61
POSITIVE LOGITS
mark
1.04
unfold
1.03
hinge
0.99
culmin
0.97
feature
0.96
usher
0.96
see
0.94
consist
0.93
determine
0.93
coincide
0.92
Activations Density 0.156%