INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Prescott
-0.85
Goff
-0.78
chip
-0.72
erity
-0.70
value
-0.65
Darwin
-0.64
Factor
-0.64
Dak
-0.63
fun
-0.63
kill
-0.63
POSITIVE LOGITS
theless
0.85
agan
0.78
otos
0.77
withstanding
0.74
icans
0.73
united
0.68
etheless
0.68
swayed
0.67
enza
0.67
contradicted
0.67
Activations Density 0.000%
No Known Activations
This feature has no known activations.