INDEX
Explanations
derogatory and inflammatory language towards specific groups or individuals
negatively charged insults and criticism
New Auto-Interp
Negative Logits
انجليز
-0.43
Tikang
-0.40
afficheront
-0.38
tasse
-0.36
hemd
-0.34
sonst
-0.34
Cycles
-0.34
ensement
-0.33
Laramie
-0.33
Ghent
-0.33
POSITIVE LOGITS
utafitiHapana
0.55
AddTagHelper
0.53
<<<<<<<<<<<<<<
0.48
fucking
0.45
TagMode
0.43
__(/*!
0.43
windowFixed
0.42
referrerpolicy
0.42
damned
0.42
SerializedSize
0.41
Activations Density 0.297%