INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.04
2:0.08
3:0.08
4:0.09
5:0.08
6:0.08
7:0.06
8:0.09
9:0.08
10:0.09
11:0.08
Negative Logits
referen
-1.67
secession
-1.64
separat
-1.62
treasurer
-1.61
conspiracy
-1.59
jihad
-1.56
libertarian
-1.55
trademark
-1.51
intimidation
-1.51
refusal
-1.51
POSITIVE LOGITS
bil
2.21
@@
2.07
ricia
2.01
rients
1.83
zac
1.77
Downs
1.70
ubs
1.69
eval
1.69
eger
1.67
ials
1.57
Activations Density 0.000%
No Known Activations
This feature has no known activations.