INDEX
Explanations
phrases related to changing one's mind
references to changes in opinions or beliefs
New Auto-Interp
Negative Logits
Grab
-0.69
achable
-0.68
Pear
-0.64
LIB
-0.64
asketball
-0.64
icious
-0.63
arton
-0.63
Strong
-0.63
Reconstruction
-0.62
Warning
-0.61
POSITIVE LOGITS
issions
0.84
emort
0.82
fortunes
0.81
radically
0.75
priorities
0.74
gears
0.73
focus
0.72
mood
0.71
odo
0.71
drastically
0.70
Activations Density 0.240%