INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
amaz
-0.72
dilig
-0.69
Magikarp
-0.67
NCT
-0.67
tast
-0.67
adolesc
-0.66
sidel
-0.65
dro
-0.64
tremend
-0.62
uncomp
-0.62
POSITIVE LOGITS
het
0.93
ower
0.80
anism
0.77
lyn
0.77
athe
0.75
kee
0.74
pell
0.71
berra
0.69
nee
0.68
adian
0.67
Activations Density 0.000%
No Known Activations
This feature has no known activations.