INDEX
Explanations
variations of the word "drop."
New Auto-Interp
Negative Logits
ial
-0.16
pur
-0.16
mits
-0.15
Voy
-0.15
iom
-0.15
pom
-0.14
l
-0.14
347
-0.14
неÑĤ
-0.14
bras
-0.14
POSITIVE LOGITS
plets
0.32
dro
0.27
plet
0.27
Dro
0.26
dro
0.25
pper
0.24
oling
0.24
gue
0.23
oping
0.22
pping
0.22
Activations Density 0.004%