INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.10
1:0.07
2:0.09
3:0.08
4:0.07
5:0.09
6:0.08
7:0.07
8:0.08
9:0.07
10:0.07
11:0.09
Negative Logits
exhib
-1.96
dress
-1.67
esp
-1.66
ukong
-1.65
civ
-1.64
solicit
-1.63
Viol
-1.61
Plaint
-1.61
ladies
-1.58
liber
-1.58
POSITIVE LOGITS
WATCHED
2.31
ournal
2.08
rade
1.90
rian
1.70
grain
1.65
hammer
1.64
majority
1.57
ools
1.55
ère
1.53
solete
1.53
Activations Density 0.000%
No Known Activations
This feature has no known activations.