INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    credito
    -0.09
     որպես
    -0.08
    contri
    -0.08
     potr
    -0.08
    .metadata
    -0.08
     credito
    -0.08
     որևէ
    -0.08
     essentiel
    -0.08
     ჩემი
    -0.08
     ככל
    -0.08
    POSITIVE LOGITS
     Reaction
    0.07
    strap
    0.07
    Reaction
    0.07
     Anpass
    0.07
     بند
    0.07
     substance
    0.07
    ター
    0.07
    ٹ
    0.07
     reaction
    0.07
     rebellious
    0.07
    Act Density 0.005%

    No Known Activations