INDEX
Explanations
phrases related to weapons and shooting
references to weapons and violent actions
New Auto-Interp
Negative Logits
"],"
-0.66
TOP
-0.66
friendships
-0.64
udget
-0.61
Situation
-0.61
TABLE
-0.60
recourse
-0.59
ĻĤ
-0.59
Anonymous
-0.59
discussions
-0.57
POSITIVE LOGITS
onto
1.15
toward
1.08
darts
1.06
towards
1.05
projectiles
0.98
into
0.94
wards
0.94
balls
0.92
forcefully
0.89
dart
0.87
Activations Density 0.307%