INDEX
Explanations
instances of decisive actions or choices
New Auto-Interp
Head Attr Weights
0:0.08
1:0.06
2:0.08
3:0.08
4:0.07
5:0.08
6:0.08
7:0.08
8:0.08
9:0.08
10:0.10
11:0.09
Negative Logits
ima
-1.67
bh
-1.64
atan
-1.62
omorphic
-1.59
amphetamine
-1.58
rud
-1.55
irtual
-1.55
acons
-1.51
dest
-1.51
trak
-1.51
POSITIVE LOGITS
beforehand
1.69
%]
1.62
hire
1.62
whiff
1.54
Jury
1.44
Trial
1.44
Prosecut
1.43
Spielberg
1.42
subpoena
1.41
successfully
1.39
Activations Density 0.000%