INDEX
Explanations
variations of the word "accept" and its related forms
New Auto-Interp
Negative Logits
alnız
-0.17
inst
-0.15
exact
-0.15
oley
-0.15
cape
-0.15
र
-0.15
dling
-0.14
dy
-0.14
fur
-0.14
олÑı
-0.14
POSITIVE LOGITS
ance
0.34
ably
0.32
responsibility
0.26
ances
0.26
ANCE
0.25
ively
0.21
reject
0.20
able
0.20
eer
0.19
ÑĥÑĩаÑģÑĤие
0.19
Activations Density 0.046%