INDEX
Explanations
people or groups of people that are negatively impacted or under threat
references to groups of people or relationships among individuals
New Auto-Interp
Negative Logits
cer
-0.75
ibrary
-0.69
WithNo
-0.67
oneself
-0.67
uve
-0.66
otos
-0.62
Ĥİ
-0.61
wcs
-0.60
itself
-0.57
guiName
-0.57
POSITIVE LOGITS
hip
1.30
'
1.25
hips
1.20
mates
1.04
selves
0.94
ervative
0.92
counterparts
0.90
'-
0.89
hare
0.89
heet
0.89
Activations Density 0.163%