INDEX
Explanations
references to specific individuals or names
New Auto-Interp
Head Attr Weights
0:0.10
1:0.09
2:0.04
3:0.03
4:0.03
5:0.27
6:0.04
7:0.01
8:0.04
9:0.14
10:0.12
11:0.04
Negative Logits
nance
-1.56
notation
-1.53
DAQ
-1.41
aic
-1.40
��
-1.35
oms
-1.34
ournal
-1.31
cn
-1.31
jl
-1.27
�
-1.23
POSITIVE LOGITS
enegger
1.68
's
1.61
himself
1.54
reluctantly
1.47
angrily
1.45
shaved
1.44
enrolled
1.43
enjoys
1.43
verbally
1.42
divorced
1.41
Activations Density 0.174%