INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
definition
-0.65
gans
-0.65
silenced
-0.64
whole
-0.62
parole
-0.61
âĦ¢:
-0.61
capt
-0.60
hunger
-0.59
censored
-0.59
penetrating
-0.59
POSITIVE LOGITS
estate
0.71
imb
0.70
agnar
0.69
Proceed
0.68
orm
0.68
Mehran
0.68
ords
0.65
ormonal
0.65
imer
0.63
trop
0.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.