INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.06
1:0.08
2:0.09
3:0.08
4:0.08
5:0.09
6:0.09
7:0.06
8:0.08
9:0.07
10:0.07
11:0.08
Negative Logits
tongue
-1.71
mistaken
-1.63
questioning
-1.58
resent
-1.57
reconc
-1.56
tongues
-1.52
derogatory
-1.52
remark
-1.51
theless
-1.51
deterior
-1.50
POSITIVE LOGITS
xa
1.84
ADA
1.80
EF
1.67
adata
1.60
ット
1.60
364
1.58
ATA
1.58
RR
1.57
RA
1.53
Agenda
1.53
Activations Density 0.000%
No Known Activations
This feature has no known activations.