INDEX
Explanations
negative imperatives or admonitions
New Auto-Interp
Negative Logits
Datuak
-0.83
лтемелер
-0.72
ویکیپدیا
-0.70
couverte
-0.70
mourut
-0.69
CreateTagHelper
-0.69
scattata
-0.68
Hauptartikel
-0.67
OGND
-0.66
medesimo
-0.65
POSITIVE LOGITS
forget
0.78
afraid
0.63
Donny
0.62
forgetting
0.62
Don
0.61
Don
0.60
Jangan
0.60
يتيمه
0.59
Dont
0.57
ванович
0.57
Activations Density 0.047%