INDEX
Explanations
references to individuals and their statements or actions
New Auto-Interp
Negative Logits
say
-0.28
says
-0.26
said
-0.25
Says
-0.24
say
-0.23
SAY
-0.23
says
-0.21
said
-0.21
说
-0.20
saying
-0.19
POSITIVE LOGITS
regret
0.19
regrets
0.19
supports
0.18
hopes
0.16
worries
0.15
hoped
0.15
doubts
0.15
/Instruction
0.15
"[
0.15
'gc
0.15
Activations Density 0.086%