INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.06
2:0.08
3:0.08
4:0.09
5:0.07
6:0.07
7:0.09
8:0.07
9:0.09
10:0.07
11:0.10
Negative Logits
Bust
-1.72
Bron
-1.64
ZIP
-1.59
conver
-1.58
Bog
-1.56
Bagg
-1.56
twent
-1.55
Levin
-1.55
Feinstein
-1.55
Draper
-1.54
POSITIVE LOGITS
ntil
1.79
orses
1.77
ModLoader
1.76
cous
1.71
mathemat
1.70
ernels
1.69
rentices
1.66
rices
1.64
gaard
1.59
isine
1.56
Activations Density 0.000%
No Known Activations
This feature has no known activations.