INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
pse
-0.83
occas
-0.67
numbered
-0.66
Shape
-0.66
concess
-0.64
interven
-0.63
exception
-0.62
Ambro
-0.61
akes
-0.60
igr
-0.60
POSITIVE LOGITS
selage
0.71
Towns
0.71
ESC
0.69
Avenger
0.67
Braun
0.65
Spears
0.64
ULAR
0.64
centrif
0.63
riger
0.62
pez
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.