INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ãĥ´
-0.80
âĸ¬âĸ¬
-0.76
Blasio
-0.73
ICA
-0.70
Zip
-0.69
eden
-0.68
isSpecialOrderable
-0.68
ROR
-0.68
racial
-0.68
ãĥį
-0.68
POSITIVE LOGITS
dishon
0.81
unsu
0.69
guilty
0.67
onite
0.65
tery
0.65
culp
0.65
276
0.64
attery
0.63
worthless
0.62
rued
0.60
Activations Density 0.000%
No Known Activations
This feature has no known activations.