INDEX
Explanations
topics related to violence and its impact on individuals, particularly women
New Auto-Interp
Negative Logits
REDACTED
-0.62
Spawn
-0.60
Hydra
-0.59
Lethal
-0.58
Diver
-0.58
UTC
-0.57
Dmit
-0.57
Seller
-0.56
Shares
-0.56
Hacker
-0.55
POSITIVE LOGITS
their
1.26
selves
1.08
themselves
1.03
their
1.00
Their
0.91
THEIR
0.87
entimes
0.86
they
0.84
Their
0.83
They
0.80
Activations Density 4.631%