INDEX
Explanations
references to reading or content recommendations
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.04
3:0.09
4:0.08
5:0.04
6:0.20
7:0.04
8:0.03
9:0.06
10:0.08
11:0.25
Negative Logits
grains
-1.91
scrolls
-1.84
Journals
-1.80
oats
-1.79
corros
-1.78
mathemat
-1.62
biotech
-1.59
Condition
-1.56
looting
-1.53
cereal
-1.53
POSITIVE LOGITS
Blanc
1.62
Deer
1.62
vae
1.57
Pole
1.57
Luna
1.56
Osaka
1.54
Torres
1.54
ğ
1.52
Yoshi
1.50
ois
1.50
Activations Density 0.002%