INDEX
Explanations
derogatory terms and insults related to people and groups
insults and derogatory terms
New Auto-Interp
Negative Logits
OGND
-0.59
okuyayım
-0.45
באנגלית
-0.45
ագրություններ
-0.44
africaine
-0.44
cstdio
-0.42
archiviato
-0.42
fédéral
-0.41
Hozzáférés
-0.40
お腹
-0.40
POSITIVE LOGITS
idiots
0.53
idiot
0.52
morons
0.51
EDEFAULT
0.50
moron
0.49
tagHelper
0.48
randomUUID
0.47
Idiot
0.45
Idiot
0.45
scound
0.45
Activations Density 0.051%