INDEX
Explanations
phrases related to negative emotions and criticism
New Auto-Interp
Negative Logits
reen
-0.78
atories
-0.76
cible
-0.71
athering
-0.68
glim
-0.67
aldi
-0.67
unker
-0.65
oult
-0.64
ativity
-0.63
atory
-0.63
POSITIVE LOGITS
!!!!!
1.30
!!!
1.22
!!
1.18
?!
1.13
!!!!
1.02
!/
0.98
@#
0.98
!"
0.94
??
0.93
@#&
0.91
Activations Density 0.013%