INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
gow
-0.73
icate
-0.66
Pru
-0.63
galitarian
-0.62
ature
-0.62
hyde
-0.62
ivo
-0.61
Staples
-0.61
OUT
-0.61
igel
-0.60
POSITIVE LOGITS
paio
0.75
unta
0.68
surv
0.63
urance
0.62
neg
0.61
exercised
0.60
civilian
0.60
ynes
0.60
reditary
0.59
romy
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.