INDEX
Explanations
variations of the word "take."
New Auto-Interp
Negative Logits
tero
-0.16
idth
-0.15
aju
-0.15
ambah
-0.15
大éĺª
-0.14
jav
-0.14
Ø®ÛĮ
-0.14
orgy
-0.14
ackers
-0.14
legen
-0.14
POSITIVE LOGITS
advantage
0.46
care
0.29
charge
0.27
Advantage
0.26
advant
0.25
refuge
0.24
adv
0.24
apart
0.24
responsibility
0.22
risks
0.22
Activations Density 0.117%