INDEX
Explanations
assertive statements about knowledge or certainty
New Auto-Interp
Head Attr Weights
0:0.08
1:0.06
2:0.09
3:0.08
4:0.08
5:0.08
6:0.08
7:0.08
8:0.08
9:0.08
10:0.07
11:0.08
Negative Logits
edged
-1.67
illustrating
-1.66
blot
-1.64
gesture
-1.61
Frie
-1.56
cautioned
-1.55
Frem
-1.53
futile
-1.50
sketch
-1.44
Hasan
-1.44
POSITIVE LOGITS
obyl
1.99
osphere
1.82
Alone
1.75
emies
1.74
Sov
1.72
hesda
1.66
HQ
1.65
Destroy
1.58
ILA
1.56
Compared
1.52
Activations Density 0.000%