INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
paren
-0.82
Height
-0.82
eln
-0.79
raq
-0.78
fur
-0.77
Jews
-0.74
uf
-0.73
smoking
-0.71
anyahu
-0.69
byn
-0.69
POSITIVE LOGITS
COVER
0.68
SWAT
0.64
Violet
0.63
Cutter
0.62
Spartans
0.61
Simone
0.60
ACTION
0.60
artz
0.59
PERSON
0.59
Lansing
0.59
Activations Density 0.000%
No Known Activations
This feature has no known activations.