INDEX
Explanations
names of people and their corresponding statements or actions
dialogue or conversational exchanges in the text
New Auto-Interp
Negative Logits
reapp
-0.76
probing
-0.72
awaits
-0.68
displaced
-0.63
reasoned
-0.63
forgotten
-0.63
contag
-0.63
pont
-0.62
stranded
-0.61
repro
-0.61
POSITIVE LOGITS
Absolutely
1.37
Yeah
1.27
Honestly
1.19
Probably
1.19
Well
1.17
Yeah
1.16
Yes
1.16
Honestly
1.14
Laughs
1.13
Absolutely
1.13
Activations Density 0.147%