INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.09
1:0.06
2:0.08
3:0.08
4:0.08
5:0.07
6:0.07
7:0.07
8:0.10
9:0.08
10:0.08
11:0.08
Negative Logits
��
-1.99
Numbers
-1.73
��
-1.72
】
-1.72
��
-1.64
oultry
-1.61
arers
-1.57
Cathy
-1.53
��
-1.52
aughters
-1.52
POSITIVE LOGITS
spoiler
1.93
center
1.62
pedia
1.60
fram
1.57
walker
1.57
sidx
1.56
wiki
1.56
oiler
1.55
pend
1.50
sonian
1.48
Activations Density 0.000%
No Known Activations
This feature has no known activations.