INDEX
Explanations
interactions involving responses and questions in a dialogue or discussion context
New Auto-Interp
Negative Logits
StructEnd
-0.68
'])->
-0.61
tfsi
-0.59
myſelf
-0.55
aarrggbb
-0.51
tayo
-0.51
neceffary
-0.50
poffible
-0.50
Conſ
-0.50
Efq
-0.49
POSITIVE LOGITS
replied
0.84
reply
0.72
responded
0.67
respondeu
0.66
replies
0.66
response
0.66
setEmail
0.63
réponses
0.62
réponse
0.61
répondu
0.61
Activations Density 0.234%