INDEX
Explanations
transitional phrases indicating a change or transformation
phrases indicating transitions or transformations from one state to another
New Auto-Interp
Negative Logits
Differences
-0.66
Refresh
-0.64
Correction
-0.60
¶
-0.60
awar
-0.60
rods
-0.59
Upgrade
-0.59
delay
-0.57
Improve
-0.57
Stories
-0.56
POSITIVE LOGITS
downright
1.02
DonaldTrump
0.83
embracing
0.83
thriving
0.79
becoming
0.77
outright
0.77
fledged
0.76
embrace
0.75
fully
0.74
menacing
0.73
Activations Density 0.326%