INDEX
Explanations
sexually suggestive or abusive
New Auto-Interp
Negative Logits
misschien
0.41
สิน
0.39
compañeros
0.39
slechts
0.39
decin
0.39
fiecare
0.39
refrigeration
0.38
sağlam
0.38
integro
0.38
preparar
0.38
POSITIVE LOGITS
запре
0.90
prohibited
0.89
禁止
0.88
banned
0.76
prohibits
0.76
forbidden
0.75
Forbidden
0.73
proib
0.73
নিষিদ্ধ
0.72
prohibiting
0.71
Activations Density 0.293%