INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.08
2:0.08
3:0.07
4:0.08
5:0.08
6:0.08
7:0.08
8:0.08
9:0.06
10:0.09
11:0.08
Negative Logits
Tit
-1.61
Irwin
-1.48
Tut
-1.47
Amos
-1.44
Romanian
-1.44
UGH
-1.43
ammy
-1.42
uggle
-1.40
vous
-1.38
igne
-1.38
POSITIVE LOGITS
Spoiler
1.56
mand
1.55
BIL
1.54
Enough
1.54
Vill
1.52
BU
1.48
grow
1.47
ailable
1.47
Plot
1.46
monarchy
1.45
Activations Density 0.000%
No Known Activations
This feature has no known activations.