INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.07
1:0.06
2:0.09
3:0.09
4:0.08
5:0.08
6:0.09
7:0.09
8:0.08
9:0.07
10:0.08
11:0.07
Negative Logits
Slaughter
-1.77
Bastard
-1.75
Magn
-1.68
Mut
-1.66
Script
-1.61
Targ
-1.57
Dra
-1.50
Enh
-1.49
Mutant
-1.48
Hun
-1.48
POSITIVE LOGITS
certify
1.90
reassure
1.71
congratulated
1.67
cryptocurrencies
1.66
thereum
1.66
rists
1.64
realDonaldTrump
1.61
aukee
1.59
merce
1.58
earchers
1.57
Activations Density 0.000%
No Known Activations
This feature has no known activations.