INDEX
Explanations
terms related to victims or victimhood
references to victims of crimes or abuse
New Auto-Interp
Negative Logits
iannopoulos
-0.68
yip
-0.64
Huck
-0.64
ás
-0.63
CLASSIFIED
-0.63
Owl
-0.61
adden
-0.60
leaf
-0.60
andel
-0.60
snipp
-0.59
POSITIVE LOGITS
ization
0.93
izes
0.93
izers
0.91
istics
0.88
blaming
0.88
izer
0.82
ize
0.79
hood
0.77
izations
0.77
izing
0.76
Activations Density 0.064%