INDEX
    Explanations

    phrases expressing opinions or evaluations

    phrases that reference comparable situations or events

    New Auto-Interp
    Negative Logits
    eteria
    -0.81
    ourse
    -0.71
    elin
    -0.69
    iband
    -0.65
    esson
    -0.65
    utenberg
    -0.65
    ells
    -0.63
    iven
    -0.63
    arate
    -0.63
    iets
    -0.63
    POSITIVE LOGITS
    lihood
    1.32
     ours
    1.24
     hers
    1.05
     yours
    0.99
     theirs
    0.93
    pires
    0.81
    liest
    0.77
     minded
    0.73
    lier
    0.70
     Deng
    0.68
    Act Density 0.105%

    No Known Activations