INDEX
    Explanations

    expressions of personal dislike or negative opinions

    New Auto-Interp
    Negative Logits
     bì
    -0.15
    ings
    -0.14
    odcast
    -0.14
    aces
    -0.14
    ienia
    -0.13
    leine
    -0.13
    éĴ
    -0.13
    insky
    -0.13
    inking
    -0.13
    ingham
    -0.13
    POSITIVE LOGITS
    tog
    0.16
    toc
    0.15
    ì¦Ŀ
    0.14
    iaux
    0.14
    arth
    0.13
     Howe
    0.13
    arta
    0.13
    ieves
    0.13
    edir
    0.13
    _VEC
    0.13
    Act Density 0.138%

    No Known Activations