INDEX
Explanations
phrases related to making decisions or taking actions
New Auto-Interp
Negative Logits
20439
-0.71
raught
-0.69
²¾
-0.68
oubted
-0.65
ritional
-0.64
oubt
-0.64
tnc
-0.62
rocal
-0.61
actionGroup
-0.61
ounter
-0.60
POSITIVE LOGITS
goddamn
1.41
*.
1.18
!
1.17
!!!
1.17
damn
1.16
godd
1.14
fucking
1.14
!!!!!!!!
1.11
!!!!
1.11
!!!!!
1.10
Activations Density 1.008%