INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
aggro
-0.66
mercy
-0.65
BP
-0.65
surfaces
-0.63
physic
-0.62
closure
-0.61
sinners
-0.60
geometry
-0.60
whiff
-0.60
severity
-0.59
POSITIVE LOGITS
retch
0.69
nant
0.68
ouk
0.68
oland
0.68
lyak
0.68
ovich
0.66
cong
0.64
monop
0.64
oÄŁ
0.64
Randolph
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.