INDEX
    Explanations

    words related to emotions or feelings

    New Auto-Interp
    Negative Logits
    óz
    -0.17
    彦
    -0.17
    оÑĩкÑĥ
    -0.16
    viz
    -0.16
    ÑıÑĤÑģÑı
    -0.16
    ÑıÑĤ
    -0.16
    Ñıл
    -0.15
    achel
    -0.15
    osed
    -0.15
    ixel
    -0.15
    POSITIVE LOGITS
    еÑģÑĤв
    0.27
    еÑģÑĤво
    0.25
    emy
    0.21
    еÑģÑĤва
    0.18
    emi
    0.18
    ãĥ§
    0.17
    ero
    0.16
    midt
    0.16
    ews
    0.16
    ем
    0.15
    Act Density 0.040%

    No Known Activations