INDEX
Explanations
references to holding law enforcement accountable for their actions
instances of special characters or symbols that might indicate emphasis or sentiment
New Auto-Interp
Negative Logits
Negro
-0.62
surv
-0.60
gib
-0.58
retreat
-0.56
litter
-0.55
catches
-0.55
jog
-0.55
overlook
-0.55
ces
-0.54
Roma
-0.54
POSITIVE LOGITS
ï¸ı
1.16
_>
0.84
20439
0.83
SHIP
0.83
ï¸
0.81
sbm
0.81
except
0.79
selves
0.75
iversary
0.75
sure
0.75
Activations Density 0.365%