INDEX
    Explanations

    opinions or attitudes of approval or disapproval

    New Auto-Interp
    Negative Logits
     Expansion
    -0.69
    eor
    -0.67
    oulos
    -0.64
    OPE
    -0.64
    perty
    -0.64
    ILA
    -0.63
    PT
    -0.63
    ropolitan
    -0.62
    senal
    -0.61
     Examination
    -0.60
    POSITIVE LOGITS
    entimes
    0.84
    cast
    0.83
    ling
    0.78
    nered
    0.78
    erd
    0.76
    hearted
    0.76
    lier
    0.75
    glers
    0.75
    entially
    0.74
    ados
    0.72
    Act Density 0.017%

    No Known Activations