INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     isActive
    -0.07
    ương
    -0.06
    008
    -0.06
    codigo
    -0.06
     awesome
    -0.06
    planation
    -0.06
    752
    -0.06
     rat
    -0.06
    aram
    -0.06
    aar
    -0.06
    POSITIVE LOGITS
     هزار
    0.07
    _modifier
    0.07
     functor
    0.07
    0.07
    σουν
    0.07
     uměl
    0.06
     конкур
    0.06
     Functor
    0.06
     공고
    0.06
     هند
    0.06
    Act Density 0.002%

    No Known Activations