INDEX
    Explanations

    adjectives and phrases describing quality and condition in reviews

    New Auto-Interp
    Negative Logits
    bes
    -0.14
    ersistence
    -0.14
    bern
    -0.14
    kara
    -0.13
    ire
    -0.13
    vis
    -0.13
    assage
    -0.13
    esda
    -0.13
    rellas
    -0.13
    undry
    -0.13
    POSITIVE LOGITS
    ä¸Ķ
    0.22
     enough
    0.17
    stvo
    0.17
    ãĥ¼ãĥĭ
    0.16
    utely
    0.16
    зано
    0.15
    ÑĤÑĸ
    0.15
    alus
    0.14
    lich
    0.14
    rava
    0.14
    Act Density 0.140%

    No Known Activations