INDEX
Explanations
mentions of individuals who are members of specified groups or organizations
references to group affiliation or membership
New Auto-Interp
Negative Logits
daylight
-0.67
rums
-0.66
symmetry
-0.60
torches
-0.56
tomatoes
-0.56
ourses
-0.55
plumbing
-0.54
unbeliev
-0.54
carnage
-0.54
rout
-0.54
POSITIVE LOGITS
of
1.15
hips
0.96
thereof
0.93
OF
0.82
Of
0.81
Of
0.77
OF
0.75
atical
0.75
less
0.73
idable
0.70
Activations Density 0.033%