INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.06
2:0.09
3:0.07
4:0.08
5:0.07
6:0.07
7:0.08
8:0.08
9:0.08
10:0.08
11:0.09
Negative Logits
ourgeois
-2.00
xual
-1.87
pless
-1.78
structure
-1.75
model
-1.65
akery
-1.65
feats
-1.64
arte
-1.64
models
-1.63
successor
-1.60
POSITIVE LOGITS
Ver
1.86
Rory
1.84
Keith
1.74
essler
1.74
Zach
1.72
Reilly
1.72
Morgan
1.69
Emerson
1.65
stim
1.65
Trent
1.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.