INDEX
Explanations
phrases related to hypothetical scenarios and potential outcomes
conditional statements indicating potential outcomes or consequences
New Auto-Interp
Negative Logits
Annotations
-0.64
LESS
-0.63
envy
-0.60
contrasts
-0.60
ictionary
-0.59
Afee
-0.59
amazed
-0.59
ighters
-0.59
;;;;;;;;
-0.58
cu
-0.57
POSITIVE LOGITS
tomorrow
1.05
someday
0.92
succeeds
0.81
succeed
0.79
hereafter
0.67
tonight
0.66
sufficiently
0.66
Maiden
0.65
prevail
0.64
BALL
0.62
Activations Density 0.308%