INDEX
    Explanations

    words and phrases that express emotional or subjective experiences

    New Auto-Interp
    Negative Logits
    го
    -0.15
    ivor
    -0.15
    YRO
    -0.15
     sy
    -0.15
    imos
    -0.14
    baz
    -0.14
     Sy
    -0.14
    atten
    -0.14
    ARGS
    -0.14
     halinde
    -0.14
    POSITIVE LOGITS
    нÑĥÑĤ
    0.21
    nut
    0.20
    нÑĥÑĤи
    0.18
    nout
    0.18
    nutÃŃm
    0.17
    ós
    0.17
    нÑĥÑĤÑĮ
    0.16
    atile
    0.16
    ÑijÑĤ
    0.15
    acket
    0.15
    Act Density 0.062%

    No Known Activations