INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
bol
-0.79
Masquerade
-0.72
Sacrifice
-0.68
Slate
-0.65
Bravo
-0.65
mination
-0.64
Bryce
-0.64
slate
-0.62
Americas
-0.62
Confederacy
-0.59
POSITIVE LOGITS
igham
1.01
artney
0.90
rehend
0.83
uden
0.83
urn
0.82
izzard
0.81
liam
0.81
eals
0.80
akura
0.77
sidx
0.77
Activations Density 0.000%
No Known Activations
This feature has no known activations.