INDEX
Explanations
references to selection processes or decision-making entities
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.07
3:0.06
4:0.26
5:0.05
6:0.03
7:0.18
8:0.02
9:0.04
10:0.10
11:0.09
Negative Logits
ctrl
-1.55
cessive
-1.55
icably
-1.54
acerb
-1.54
artifacts
-1.53
chwitz
-1.52
agra
-1.52
vable
-1.51
ront
-1.50
amacare
-1.50
POSITIVE LOGITS
insiders
1.91
audience
1.84
onlook
1.81
producers
1.79
reviewers
1.79
recipients
1.70
audiences
1.70
interviewer
1.69
consortium
1.66
responders
1.66
Activations Density 0.001%