INDEX
Explanations
Wiktionary and Urban Dictionary definitions
New Auto-Interp
Negative Logits
HTML
0.56
pflege
0.56
ossip
0.55
Energy
0.55
t
0.55
g
0.54
vp
0.52
sag
0.52
total
0.52
kol
0.52
POSITIVE LOGITS
peux
0.59
Dictionary
0.56
njihov
0.55
возра
0.51
dicion
0.51
dictionary
0.51
avut
0.50
及ひ
0.50
mümkün
0.49
میتواند
0.49
Activations Density 0.004%