INDEX
Explanations
phrases related to personal stories and questions
prompts or questions that seek information or clarification
New Auto-Interp
Negative Logits
flowing
-0.83
sustained
-0.82
synerg
-0.81
piled
-0.79
downstream
-0.79
committed
-0.78
flowering
-0.77
funeral
-0.76
flavored
-0.76
overt
-0.75
POSITIVE LOGITS
Answer
2.23
Well
1.87
Yes
1.78
Absolutely
1.72
Probably
1.61
Wr
1.53
Certainly
1.53
Yeah
1.52
Actually
1.49
Sure
1.48
Activations Density 0.202%