INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
osuke
-0.86
reciation
-0.83
ogly
-0.77
crew
-0.75
idon
-0.73
olding
-0.73
od
-0.71
bryce
-0.71
ome
-0.69
build
-0.66
POSITIVE LOGITS
decriminal
0.72
polio
0.67
rall
0.66
proced
0.66
redistributed
0.63
waters
0.63
mete
0.62
metast
0.62
uria
0.62
persecut
0.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.