INDEX
Explanations
phrases and verbs indicating communication and interaction between people
New Auto-Interp
Head Attr Weights
0:0.04
1:0.01
2:0.04
3:0.05
4:0.04
5:0.11
6:0.01
7:0.03
8:0.36
9:0.05
10:0.13
11:0.06
Negative Logits
bon
-1.74
lees
-1.73
®
-1.65
Surviv
-1.65
��
-1.63
kun
-1.62
Miko
-1.59
cham
-1.59
ilee
-1.55
ghai
-1.55
POSITIVE LOGITS
replies
2.50
Answer
2.30
reply
2.28
complied
2.28
replied
2.25
answer
2.22
answered
2.08
affirmative
1.98
hesitated
1.95
obliged
1.94
Activations Density 0.143%