INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
osure
-0.81
peripher
-0.79
icut
-0.72
kin
-0.68
stabilized
-0.68
entrants
-0.66
horizont
-0.65
accompan
-0.63
pressed
-0.62
lig
-0.62
POSITIVE LOGITS
BN
0.73
ibus
0.69
BS
0.67
ucc
0.66
Bans
0.64
rid
0.64
Mill
0.62
NV
0.61
cas
0.61
bern
0.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.