INDEX
    Explanations

    phrases indicating disbelief or denial

    New Auto-Interp
    Negative Logits
    ç·Ĵ
    -0.17
    raki
    -0.15
     münchen
    -0.14
    anon
    -0.14
    utron
    -0.14
    inem
    -0.14
    imar
    -0.14
    wc
    -0.14
    Ь
    -0.14
    avis
    -0.14
    POSITIVE LOGITS
     sorter
    0.28
     arter
    0.21
     jest
    0.20
     ez
    0.19
     pore
    0.19
     onc
    0.19
     git
    0.18
    jis
    0.18
     keer
    0.18
     Jest
    0.18
    Act Density 0.052%

    No Known Activations