INDEX
    Explanations

    phrases that express personal opinions or recommendations

    New Auto-Interp
    Negative Logits
    yne
    -0.14
    (æĹ¥
    -0.14
    urance
    -0.14
    UED
    -0.14
    gett
    -0.13
    -eslint
    -0.13
    ãĥ¥ãĥ¼
    -0.13
     Verfügung
    -0.13
    burgh
    -0.12
     compensated
    -0.12
    POSITIVE LOGITS
     je
    0.29
     ça
    0.25
     tu
    0.24
     Ãĩ
    0.23
     tes
    0.22
     moi
    0.21
     mon
    0.21
     attends
    0.21
    ça
    0.20
     pas
    0.19
    Act Density 0.043%

    No Known Activations