INDEX
    Explanations

    positive adjectives and expressions of appreciation

    New Auto-Interp
    Negative Logits
    emes
    -0.17
    ofire
    -0.17
    .opendaylight
    -0.16
    697
    -0.15
    oire
    -0.14
    431
    -0.14
    ernel
    -0.14
    ало
    -0.14
    anos
    -0.14
    ierten
    -0.14
    POSITIVE LOGITS
    jÃŃ
    0.16
    igger
    0.16
    ande
    0.14
     shame
    0.14
     indeed
    0.14
    @Web
    0.14
     NÄĽm
    0.14
    ible
    0.13
    jÃŃm
    0.13
    rey
    0.13
    Act Density 0.039%

    No Known Activations