INDEX
Explanations
expressions of confidence and belief in abilities
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.08
3:0.08
4:0.08
5:0.02
6:0.05
7:0.43
8:0.04
9:0.04
10:0.06
11:0.05
Negative Logits
jee
-1.62
�
-1.45
="#
-1.34
etz
-1.33
videos
-1.33
misdem
-1.32
Quartz
-1.30
flies
-1.30
netflix
-1.30
interesting
-1.30
POSITIVE LOGITS
securing
1.77
assurance
1.75
rity
1.71
predicting
1.64
ability
1.63
knowing
1.59
secure
1.58
correctness
1.57
confidently
1.57
independence
1.56
Activations Density 0.013%