INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ens
-0.78
eur
-0.72
bats
-0.64
crow
-0.64
ensued
-0.61
seaw
-0.61
Shea
-0.60
Pelicans
-0.59
impunity
-0.58
Haram
-0.57
POSITIVE LOGITS
ebted
0.76
ipop
0.75
ipolar
0.72
ileged
0.72
STEP
0.68
ividually
0.67
bare
0.64
icrobial
0.63
arma
0.62
leaf
0.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.