INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.09
2:0.07
3:0.09
4:0.09
5:0.07
6:0.07
7:0.08
8:0.08
9:0.08
10:0.07
11:0.07
Negative Logits
acknow
-2.27
proble
-2.13
Baghd
-2.05
wu
-1.86
yi
-1.85
CODE
-1.83
agre
-1.81
SECTION
-1.80
conclud
-1.79
oğ
-1.79
POSITIVE LOGITS
animate
1.80
ensity
1.69
living
1.64
hire
1.61
olls
1.61
hers
1.55
neutral
1.52
exper
1.52
elson
1.51
magnet
1.51
Activations Density 0.000%
No Known Activations
This feature has no known activations.