INDEX
    Explanations

    phrases and expressions about expectations and norms

    New Auto-Interp
    Negative Logits
    amas
    -0.17
     ju
    -0.15
    atÃŃm
    -0.15
    owski
    -0.14
    ypy
    -0.14
    än
    -0.14
     bra
    -0.13
     Bra
    -0.13
    stub
    -0.13
    etti
    -0.13
    POSITIVE LOGITS
     necessarily
    0.17
     zbo
    0.16
    Ỽi
    0.15
    dre
    0.15
     burden
    0.15
    inton
    0.14
    rup
    0.14
    ç̬
    0.14
    rende
    0.13
    605
    0.13
    Act Density 0.054%

    No Known Activations