INDEX
    Explanations

    percentages

    New Auto-Interp
    Negative Logits
    ilo
    -0.07
     آقای
    -0.07
     nephew
    -0.06
     İki
    -0.06
    brahim
    -0.06
    едини
    -0.06
    _distribution
    -0.06
    Luc
    -0.06
    üp
    -0.06
     artisan
    -0.06
    POSITIVE LOGITS
    .Named
    0.08
     explain
    0.07
     creature
    0.07
    ทะ
    0.06
     explaining
    0.06
    pq
    0.06
     theo
    0.06
    ..
    0.06
     strain
    0.06
    WithDuration
    0.06
    Act Density 0.013%

    No Known Activations