INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.05
1:0.05
2:0.08
3:0.11
4:0.08
5:0.06
6:0.07
7:0.06
8:0.06
9:0.08
10:0.09
11:0.17
Negative Logits
decomp
-1.63
tut
-1.62
compiled
-1.60
fatig
-1.58
emb
-1.49
Deng
-1.48
mater
-1.47
therapists
-1.47
ographies
-1.45
antiqu
-1.45
POSITIVE LOGITS
antha
2.17
osher
1.91
retty
1.81
zee
1.81
ーティ
1.79
nom
1.72
WARD
1.71
udeau
1.67
deal
1.65
sth
1.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.