INDEX
    Explanations

    sentences involving expressing opinions, reactions, or critiques

    New Auto-Interp
    Negative Logits
    vid
    -0.17
    ire
    -0.15
    вид
    -0.14
     Vid
    -0.14
    PTS
    -0.14
     pil
    -0.14
    elda
    -0.14
    PT
    -0.14
     others
    -0.13
    osh
    -0.13
    POSITIVE LOGITS
    ýt
    0.16
    iscrimination
    0.16
    iciary
    0.16
    _HERE
    0.16
     PROFITS
    0.15
    ,application
    0.14
    ioned
    0.14
    mpar
    0.14
    ]={↵
    0.14
    .sg
    0.14
    Act Density 0.395%

    No Known Activations