INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
sooner
-0.74
Esports
-0.74
steroids
-0.72
Fin
-0.69
Vale
-0.68
Owl
-0.67
Society
-0.67
}}}
-0.66
Athletics
-0.64
!/
-0.64
POSITIVE LOGITS
birth
0.72
mosqu
0.70
gling
0.70
worker
0.69
dn
0.69
Muslim
0.69
typ
0.68
odd
0.67
emin
0.65
bryce
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.