INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.08
3:0.08
4:0.08
5:0.08
6:0.08
7:0.07
8:0.09
9:0.07
10:0.09
11:0.09
Negative Logits
interaction
-1.72
enez
-1.71
chars
-1.61
Levy
-1.61
Philly
-1.60
��
-1.56
exceptions
-1.56
Chomsky
-1.55
Giuliani
-1.54
ゴン
-1.54
POSITIVE LOGITS
Guide
1.82
conservative
1.79
luster
1.75
YP
1.70
Untitled
1.67
Vert
1.63
conserv
1.62
resents
1.60
Ranked
1.59
Rot
1.57
Activations Density 0.000%
No Known Activations
This feature has no known activations.