INDEX
Explanations
phrases related to personal experiences and emotions in a narrative context
New Auto-Interp
Negative Logits
“
-2.32
”
-2.27
’”
-2.25
‘
-2.24
’.
-2.22
’
-2.21
’,
-2.20
.’
-2.19
’)
-2.19
’).
-2.14
POSITIVE LOGITS
"
1.82
'
1.72
。"
1.67
'
1.58
"
1.56
'"
1.45
("1.39
,"
1.38
..."
1.34
:"
1.31
Activations Density 1.433%