INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
propensity
-0.68
props
-0.67
explan
-0.66
yrim
-0.65
awaru
-0.65
goodies
-0.64
theme
-0.62
ok
-0.62
pockets
-0.60
dism
-0.60
POSITIVE LOGITS
ciplinary
0.92
ibel
0.89
peak
0.75
hare
0.73
ibe
0.72
lled
0.67
Test
0.66
sites
0.65
acter
0.65
quit
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.