INDEX
Explanations
words related to the removal and implications of lead in various contexts
New Auto-Interp
Negative Logits
bags
-0.79
xual
-0.79
ership
-0.76
boxes
-0.72
ery
-0.68
ers
-0.65
STEM
-0.64
pot
-0.63
в
-0.63
wit
-0.62
POSITIVE LOGITS
aining
0.93
ainer
0.92
ittance
0.91
bered
0.86
icably
0.85
ained
0.85
raction
0.85
oving
0.84
illard
0.84
inant
0.81
Activations Density 0.804%