INDEX
Explanations
references to relationships and emotions
New Auto-Interp
Negative Logits
")){
-1.06
"){
-1.03
"):
-0.99
'},
-0.98
'),
-0.98
"},
-0.96
".
-0.94
"),
-0.92
'):
-0.92
-
-0.90
POSITIVE LOGITS
.
1.17
;
1.08
!
1.03
,
1.02
?
0.85
:
0.70
!!
0.64
。
0.60
!!!
0.59
.*
0.58
Activations Density 1.280%