INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ç»ıæŁ¥
-0.28
Took
-0.27
ets
-0.27
zos
-0.26
æijĨ
-0.26
eth
-0.26
others
-0.25
пÑĢедназ
-0.25
âĤĵ
-0.25
Others
-0.24
POSITIVE LOGITS
å¿Ļ
0.30
her
0.27
æĬ½åĩº
0.26
races
0.26
integr
0.25
per
0.25
ogi
0.24
uman
0.24
BaseModel
0.24
isman
0.23
Activations Density 0.006%
No Known Activations
This feature has no known activations.