INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
dysph
-0.77
Ire
-0.72
agascar
-0.72
susceptible
-0.65
merce
-0.65
stomach
-0.64
monarch
-0.64
conflic
-0.62
ierrez
-0.62
rhy
-0.60
POSITIVE LOGITS
ano
0.85
apo
0.79
ushi
0.77
anan
0.76
esa
0.74
dan
0.71
女
0.69
IDA
0.66
aminer
0.66
aking
0.66
Activations Density 0.000%
No Known Activations
This feature has no known activations.