INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
zac
-0.73
haar
-0.72
Fram
-0.70
oris
-0.68
illac
-0.68
haus
-0.67
flix
-0.67
hers
-0.65
redits
-0.65
Grey
-0.65
POSITIVE LOGITS
constituent
0.66
isted
0.66
Eater
0.64
founded
0.64
clusive
0.64
ominated
0.64
cou
0.63
mble
0.63
ovych
0.61
sacked
0.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.