INDEX
Explanations
the word "Does" in varying contexts
New Auto-Interp
Negative Logits
t
-0.69
pr
-0.68
m
-0.68
tư
-0.64
ur
-0.62
r
-0.61
sub
-0.60
iv
-0.57
ing
-0.56
tw
-0.56
POSITIVE LOGITS
Does
1.71
Does
1.67
does
1.63
DOES
1.57
does
1.56
DOES
1.49
itſelf
1.20
dosen
1.15
verwijspagina
1.10
doesnt
1.10
Activations Density 0.111%