INDEX
Explanations
responses or answers in a conversation
instances of dialogue or responses within the text
New Auto-Interp
Negative Logits
icipated
-0.69
dar
-0.67
cipled
-0.67
ctors
-0.67
bons
-0.66
ammy
-0.64
cgi
-0.62
hoff
-0.62
teenth
-0.62
Blazers
-0.62
POSITIVE LOGITS
angrily
1.01
favorably
0.93
sarcast
0.90
enthusiastically
0.86
affirm
0.82
politely
0.80
promptly
0.77
thereto
0.77
harshly
0.77
indign
0.77
Activations Density 0.039%