INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.09
1:0.07
2:0.08
3:0.09
4:0.08
5:0.07
6:0.07
7:0.07
8:0.07
9:0.08
10:0.08
11:0.09
Negative Logits
aurus
-1.97
bis
-1.72
nas
-1.70
oids
-1.63
Square
-1.61
istg
-1.60
nikov
-1.59
gard
-1.56
�
-1.56
ban
-1.55
POSITIVE LOGITS
restroom
1.64
behavi
1.61
volunte
1.58
calibr
1.58
ⓘ
1.51
conserv
1.50
volunteering
1.50
leans
1.49
upkeep
1.49
calibration
1.48
Activations Density 0.000%
No Known Activations
This feature has no known activations.