INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
nds
-0.17
ubo
-0.16
ieu
-0.15
šku
-0.15
ofi
-0.15
----------------------------------------------------------------------------↵
-0.15
ght
-0.14
inee
-0.14
prow
-0.14
Carlson
-0.14
POSITIVE LOGITS
uli
0.15
Moder
0.15
hi
0.15
fos
0.15
ev
0.14
decom
0.14
occasion
0.14
&&&&
0.14
ura
0.14
hi
0.14
Activations Density 0.000%
No Known Activations
This feature has no known activations.