INDEX
Explanations
instances of violent actions or events
New Auto-Interp
Negative Logits
estekak
-0.70
')['
-0.67
oredCriteria
-0.67
utafitiHapana
-0.63
).
-0.61
[toxicity=0]
-0.61
↵
-0.61
.*/
-0.59
.…
-0.58
المعيارى
-0.57
POSITIVE LOGITS
***!
0.80
AssemblyCompany
0.74
initComponents
0.71
تقاوى
0.70
Hentet
0.65
millimeters
0.64
﹍﹍﹍
0.63
SourceChecksum
0.63
AndEndTag
0.62
tagHelperRunner
0.62
Activations Density 25.488%