INDEX
Explanations
phrases indicating absence or negation
New Auto-Interp
Negative Logits
nieuw
-0.39
'))
-0.36
encontramos
-0.36
iem
-0.34
dulu
-0.33
egyszerű
-0.32
logisch
-0.32
heutigen
-0.31
sederhana
-0.31
Välislingid
-0.31
POSITIVE LOGITS
mut
0.85
ModelRenderer
0.83
ujednoznacz
0.74
صوتيه
0.73
mut
0.73
<?
0.73
OMITBAD
0.69
vessel
0.68
/**
0.67
without
0.65
Activations Density 0.249%