INDEX
Explanations
words related to responses or replies in the context of conversations
instances of dialogue or spoken responses
New Auto-Interp
Negative Logits
ctors
-0.77
olin
-0.73
cipled
-0.71
bons
-0.71
icipated
-0.69
dar
-0.68
wed
-0.68
prus
-0.67
gone
-0.65
ciples
-0.64
POSITIVE LOGITS
angrily
0.88
sarcast
0.88
thereto
0.83
favorably
0.82
affirm
0.80
reply
0.77
indign
0.73
harshly
0.72
whine
0.72
replies
0.71
Activations Density 0.028%