INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     
    0.66
     a
    0.61
     the
    0.61
     (
    0.59
    -
    0.58
     an
    0.58
         
    0.57
    :
    0.57
           
    0.55
     or
    0.54
    POSITIVE LOGITS
     sourire
    0.64
     председа
    0.61
     слегка
    0.61
     تلیفون
    0.59
    ပြော
    0.59
    ینګ
    0.59
     sorriso
    0.57
     говорил
    0.57
    eszcze
    0.56
     покер
    0.56
    Act Density 0.025%

    No Known Activations