INDEX
    Explanations

    feeling, committed, fall

    New Auto-Interp
    Negative Logits
     truk
    0.88
    стный
    0.83
     związ
    0.82
    вная
    0.80
     sniper
    0.79
     sisi
    0.78
     nokta
    0.78
     lick
    0.72
     smirk
    0.72
    Ы
    0.71
    POSITIVE LOGITS
    0.82
     работе
    0.77
    r
    0.73
    지를
    0.73
    0.72
    ر
    0.70
    iendo
    0.68
    р
    0.68
    abhavena
    0.67
    지는
    0.66
    Act Density 0.001%

    No Known Activations