INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Important
    -0.08
    (-
    -0.07
     dauer
    -0.07
     (...)
    -0.07
     долго
    -0.07
     argu
    -0.07
    ياتي
    -0.07
    missing
    -0.07
     longitudinal
    -0.07
    -present
    -0.07
    POSITIVE LOGITS
     comedian
    0.09
     fragrant
    0.09
     gần
    0.08
    0.08
     chuyển
    0.08
     Sai
    0.08
     vocalist
    0.08
     guitarist
    0.08
     không
    0.08
     golden
    0.08
    Act Density 0.003%

    No Known Activations