INDEX
Explanations
expressions of trust and reliance on personal judgment
New Auto-Interp
Head Attr Weights
0:0.04
1:0.02
2:0.06
3:0.08
4:0.15
5:0.04
6:0.04
7:0.24
8:0.05
9:0.04
10:0.11
11:0.07
Negative Logits
uble
-1.92
deficit
-1.49
naire
-1.44
pools
-1.43
pits
-1.42
deficits
-1.37
complexes
-1.33
spectator
-1.33
etter
-1.30
reckoning
-1.29
POSITIVE LOGITS
Instruct
1.51
Corsair
1.49
Attribution
1.43
sounded
1.43
preach
1.34
elight
1.33
wt
1.31
hearty
1.29
louder
1.27
nos
1.27
Activations Density 0.000%