INDEX
Explanations
referential phrases that emphasize various emphatic expressions or opinions
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.05
3:0.17
4:0.06
5:0.01
6:0.40
7:0.05
8:0.03
9:0.03
10:0.04
11:0.08
Negative Logits
Published
-1.48
ipel
-1.39
Chel
-1.35
850
-1.34
ashington
-1.32
second
-1.25
NX
-1.23
gam
-1.20
875
-1.20
shown
-1.20
POSITIVE LOGITS
uddin
1.52
ody
1.38
WER
1.33
ierrez
1.32
ochet
1.29
inent
1.25
udi
1.22
UGH
1.21
tery
1.20
inence
1.19
Activations Density 0.005%