INDEX
Explanations
phrases questioning beliefs, values, and actions
questions and statements related to choices and beliefs
New Auto-Interp
Negative Logits
Prim
-0.63
Pap
-0.59
ombat
-0.59
BALL
-0.58
iard
-0.58
((
-0.58
9000
-0.57
Tib
-0.56
Dri
-0.56
630
-0.56
POSITIVE LOGITS
iety
0.82
aspire
0.73
degrade
0.70
threaten
0.70
innovate
0.69
pray
0.68
worsen
0.68
prosper
0.67
morrow
0.66
feared
0.66
Activations Density 0.231%