INDEX
    Explanations

    Naming/referring to something

    New Auto-Interp
    Negative Logits
     contrace
    -0.07
     تحت
    -0.07
    Declare
    -0.07
     nombre
    -0.07
     أحد
    -0.07
     NORMAL
    -0.07
    _NEW
    -0.06
     ضمن
    -0.06
     together
    -0.06
     Rum
    -0.06
    POSITIVE LOGITS
    mite
    0.08
    0.07
     фонд
    0.07
     usern
    0.07
    зи
    0.07
     yelled
    0.07
    Gradient
    0.06
    step
    0.06
     bounty
    0.06
    _card
    0.06
    Act Density 0.110%

    No Known Activations