INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
inge
-0.81
MLA
-0.68
burner
-0.67
lie
-0.62
optim
-0.61
achie
-0.61
itto
-0.60
trembling
-0.59
boil
-0.59
ettle
-0.59
POSITIVE LOGITS
Ruk
0.61
ouf
0.59
Louis
0.58
isphere
0.58
Marian
0.57
igated
0.57
achusetts
0.57
schild
0.57
cca
0.56
########
0.56
Activations Density 0.000%
No Known Activations
This feature has no known activations.