INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    are
    0.74
     (
    0.65
    ۰۰
    0.64
     At
    0.59
     at
    0.57
    agility
    0.54
    AJ
    0.54
    AH
    0.54
    σσ
    0.50
     a
    0.50
    POSITIVE LOGITS
    powied
    0.67
     ਉਨ੍ਹਾਂ
    0.64
    0.60
     pregunt
    0.59
     ምክንያት
    0.59
     obowią
    0.57
     intitul
    0.56
     pergunt
    0.56
    టువంటి
    0.56
     profund
    0.55
    Act Density 0.049%

    No Known Activations