INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.09
1:0.06
2:0.09
3:0.08
4:0.08
5:0.09
6:0.08
7:0.08
8:0.08
9:0.06
10:0.07
11:0.07
Negative Logits
Races
-1.77
iversary
-1.76
issions
-1.76
oran
-1.62
ori
-1.57
thood
-1.56
justice
-1.55
Vegan
-1.51
ean
-1.51
grad
-1.49
POSITIVE LOGITS
etheless
1.97
fundament
1.79
Hig
1.66
porous
1.64
nonexistent
1.61
orgetown
1.61
nil
1.60
Uk
1.59
�
1.58
blat
1.57
Activations Density 0.000%
No Known Activations
This feature has no known activations.