INDEX
Explanations
phrases indicating assistance or helpfulness
New Auto-Interp
Negative Logits
setPixel
-0.49
admitted
-0.48
hates
-0.47
interviewed
-0.46
orec
-0.45
Eternal
-0.45
rushed
-0.45
erythrocytes
-0.44
asked
-0.44
preguntar
-0.44
POSITIVE LOGITS
helps
0.86
kaarangay
0.81
Helps
0.73
helps
0.72
nhằm
0.69
ensures
0.69
Helps
0.68
affords
0.67
ivelany
0.67
باعث
0.67
Activations Density 0.624%