INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
atoon
-0.78
ategory
-0.77
seless
-0.73
porary
-0.73
tein
-0.72
iction
-0.71
emale
-0.70
icted
-0.69
showers
-0.68
ricanes
-0.68
POSITIVE LOGITS
ci
0.73
mother
0.69
father
0.68
neighb
0.65
sen
0.65
gemony
0.65
influ
0.63
Palest
0.63
Eth
0.63
Ariel
0.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.