INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Redd
-0.83
Thousand
-0.76
Pu
-0.74
Tong
-0.69
pun
-0.68
Yi
-0.68
Trin
-0.67
Origin
-0.67
Kard
-0.66
Twins
-0.65
POSITIVE LOGITS
gemony
0.74
phrine
0.69
otomy
0.68
lement
0.68
rael
0.68
gian
0.65
azo
0.65
arte
0.65
arity
0.64
obby
0.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.