INDEX
Explanations
phrases related to criminal activities such as robbery and harassment
instances of theft or victimization
New Auto-Interp
Negative Logits
appeal
-0.79
recommendation
-0.75
emphasis
-0.74
inference
-0.73
availability
-0.73
adoption
-0.73
mitigation
-0.72
deployment
-0.72
usage
-0.72
issuance
-0.71
POSITIVE LOGITS
robbed
2.07
harassed
2.02
harmed
1.94
prosecuted
1.82
cheated
1.82
recruited
1.79
deceived
1.79
bullied
1.78
punished
1.77
interrogated
1.76
Activations Density 0.074%