INDEX
    Explanations

    expressions of high emotional intensity or strong opinions

    New Auto-Interp
    Negative Logits
    ibar
    -0.17
    vie
    -0.16
    ovable
    -0.15
    logen
    -0.15
    ailer
    -0.14
    onso
    -0.14
    leta
    -0.14
    reira
    -0.14
    efe
    -0.14
    adoo
    -0.14
    POSITIVE LOGITS
    Pros
    0.20
     pros
    0.17
     Pros
    0.17
     Overall
    0.15
    ajs
    0.15
     overall
    0.15
     Aws
    0.14
    âĢı
    0.14
    overall
    0.14
    íĺ¼
    0.14
    Act Density 0.043%

    No Known Activations