INDEX
Explanations
instances of violence or abusive behavior
New Auto-Interp
Negative Logits
misiones
-0.41
trữ
-0.41
plán
-0.40
úsqueda
-0.40
resources
-0.40
Vermögen
-0.39
cesty
-0.39
complexType
-0.38
visionnement
-0.38
Sicherung
-0.37
POSITIVE LOGITS
beat
0.67
beat
0.66
punches
0.64
BEAT
0.63
httphttps
0.60
Beat
0.60
beating
0.59
beaten
0.58
ضرب
0.58
beats
0.57
Activations Density 0.420%