INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    oci
    -0.10
     vigil
    -0.09
    asu
    -0.09
     undocumented
    -0.09
    idi
    -0.09
    swire
    -0.09
     foster
    -0.09
     caregivers
    -0.09
     cooperation
    -0.09
    abi
    -0.09
    POSITIVE LOGITS
     neutral
    0.23
    Neutral
    0.20
    neutral
    0.19
     Neutral
    0.19
     third
    0.18
     neutr
    0.18
    -neutral
    0.17
     medi
    0.16
     neutrality
    0.15
     impartial
    0.15
    Act Density 0.050%

    No Known Activations