INDEX
Explanations
conversation patterns with questions and answers
specific responses or affirmations within a conversation
New Auto-Interp
Negative Logits
blot
-0.71
ded
-0.64
undone
-0.63
Os
-0.62
targets
-0.62
uninterrupted
-0.60
visible
-0.59
foreseen
-0.59
Dems
-0.58
airborne
-0.56
POSITIVE LOGITS
Answer
1.19
Answer
1.07
answer
0.97
maxwell
0.96
Nope
0.88
swer
0.85
ingham
0.82
aber
0.82
swers
0.81
answ
0.79
Activations Density 0.357%