INDEX
Explanations
acknowledges, affect, or address
New Auto-Interp
Negative Logits
eli
0.40
ugen
0.40
all
0.37
uč
0.37
uro
0.37
riz
0.37
tor
0.37
uki
0.37
urry
0.37
andan
0.36
POSITIVE LOGITS
keinginan
0.50
deseo
0.47
styl
0.45
désir
0.45
desire
0.44
periodista
0.44
interplay
0.43
desejo
0.42
stylistic
0.42
desiderio
0.42
Activations Density 0.000%