INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.08
3:0.09
4:0.08
5:0.07
6:0.09
7:0.08
8:0.08
9:0.08
10:0.09
11:0.08
Negative Logits
Cosponsors
-2.13
Americ
-2.00
nces
-1.99
looph
-1.85
ignor
-1.82
disg
-1.75
Poc
-1.74
nown
-1.72
Ajax
-1.69
hypoc
-1.68
POSITIVE LOGITS
robe
1.67
Workshop
1.67
Kate
1.65
lower
1.64
Dust
1.63
shine
1.61
Rose
1.58
Dove
1.57
UV
1.57
Laura
1.56
Activations Density 0.000%
No Known Activations
This feature has no known activations.