INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
gate
-0.73
nces
-0.70
nels
-0.68
grab
-0.67
icz
-0.67
yip
-0.65
buf
-0.65
nz
-0.65
thumbnails
-0.64
haven
-0.64
POSITIVE LOGITS
seamless
0.67
onest
0.65
enance
0.65
word
0.65
intellig
0.65
女
0.63
unint
0.62
avering
0.62
Leg
0.61
Mayer
0.60
Activations Density 0.000%
No Known Activations
This feature has no known activations.