INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    parameters
    -0.07
     facebook
    -0.06
    atorio
    -0.06
     keyboards
    -0.06
     tolerant
    -0.06
     chin
    -0.06
    -0.06
    해서
    -0.06
     abb
    -0.06
    wap
    -0.06
    POSITIVE LOGITS
    esh
    0.07
     nimi
    0.06
    0.06
    ΑΤ
    0.06
     phá
    0.06
     وصلات
    0.06
     علاق
    0.06
    0.06
     tắt
    0.06
     θα
    0.06
    Act Density 0.015%

    No Known Activations