INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.07
3:0.09
4:0.08
5:0.08
6:0.07
7:0.08
8:0.09
9:0.08
10:0.08
11:0.08
Negative Logits
ω
-2.02
osate
-1.97
icken
-1.96
��
-1.85
chwitz
-1.83
////////////////
-1.78
gger
-1.78
trak
-1.78
bye
-1.76
raught
-1.75
POSITIVE LOGITS
persist
1.82
spor
1.73
stagn
1.68
poll
1.68
Lum
1.68
showc
1.67
slack
1.63
langu
1.61
misc
1.60
prolific
1.59
Activations Density 0.000%
No Known Activations
This feature has no known activations.