INDEX
Explanations
phrases related to actions that may involve decision-making or moral judgment
expressions related to opportunities and willingness to help or engage
New Auto-Interp
Negative Logits
mourning
-0.59
Miscellaneous
-0.58
Worse
-0.57
phans
-0.56
Survivors
-0.56
igmat
-0.55
Governors
-0.55
tarn
-0.54
neum
-0.54
Revelations
-0.54
POSITIVE LOGITS
gladly
1.02
."
0.82
accordingly
0.81
.''
0.81
starter
0.80
!
0.80
!.
0.78
brainer
0.77
.
0.77
."[
0.77
Activations Density 0.867%