INDEX
Explanations
elements of conversational interviews
New Auto-Interp
Head Attr Weights
0:0.10
1:0.02
2:0.04
3:0.14
4:0.02
5:0.06
6:0.01
7:0.08
8:0.03
9:0.01
10:0.40
11:0.03
Negative Logits
dict
-2.22
gone
-2.15
Failure
-2.10
failure
-2.08
prevail
-2.07
ignores
-2.07
ineffective
-2.04
inflic
-2.03
complying
-2.00
Wr
-1.97
POSITIVE LOGITS
fascinating
2.36
detailing
2.13
firsthand
2.11
Detail
2.11
homebrew
2.05
Discuss
1.98
extraord
1.97
raq
1.93
interesting
1.89
intriguing
1.89
Activations Density 0.048%