INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
catentry
-0.78
Activate
-0.71
Learns
-0.69
atio
-0.68
SourceFile
-0.67
anium
-0.67
esm
-0.66
wcs
-0.65
rosso
-0.64
packs
-0.63
POSITIVE LOGITS
cuff
0.69
ILE
0.64
artisan
0.63
dim
0.60
satirical
0.60
headed
0.58
entin
0.58
orce
0.57
y
0.57
ilitary
0.57
Activations Density 0.000%
No Known Activations
This feature has no known activations.