INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
regex
-0.71
CPC
-0.71
incent
-0.69
EGIN
-0.69
ethical
-0.67
shoulders
-0.66
uliffe
-0.65
proverb
-0.65
prohibitions
-0.65
constitu
-0.65
POSITIVE LOGITS
waukee
0.83
Ke
0.80
brew
0.77
cakes
0.77
her
0.75
quart
0.75
jen
0.74
ju
0.73
fig
0.72
STAR
0.72
Activations Density 0.000%
No Known Activations
This feature has no known activations.