INDEX
Negative Logits
oci
-0.10
vigil
-0.09
asu
-0.09
undocumented
-0.09
idi
-0.09
swire
-0.09
foster
-0.09
caregivers
-0.09
cooperation
-0.09
abi
-0.09
POSITIVE LOGITS
neutral
0.23
Neutral
0.20
neutral
0.19
Neutral
0.19
third
0.18
neutr
0.18
-neutral
0.17
medi
0.16
neutrality
0.15
impartial
0.15
Activations Density 0.050%