INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ziv
    -0.08
     نبات
    -0.08
    WORD
    -0.08
     Hauptstadt
    -0.08
     քեզ
    -0.08
    ിൻ
    -0.08
    Vz
    -0.08
    ината
    -0.07
    ikið
    -0.07
     kuitenkin
    -0.07
    POSITIVE LOGITS
     selectors
    0.08
     semester
    0.08
     mees
    0.07
    _selector
    0.07
     tertentu
    0.07
     themselves
    0.07
     different
    0.07
     опыта
    0.07
     Selector
    0.07
    不同
    0.07
    Act Density 0.004%

    No Known Activations