INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ahn
-0.15
901
-0.14
857
-0.14
ibus
-0.14
litter
-0.14
ohan
-0.14
ochen
-0.13
517
-0.13
lik
-0.13
<<<
-0.13
POSITIVE LOGITS
hek
0.15
atsu
0.15
ifacts
0.15
iset
0.15
eced
0.14
strup
0.14
atab
0.14
Nah
0.14
avel
0.14
icens
0.14
Activations Density 0.000%
No Known Activations
This feature has no known activations.