INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     concess
    -0.69
     scrut
    -0.66
    iden
    -0.61
    ŃĶ
    -0.61
    ĺħ
    -0.60
    isse
    -0.60
    igslist
    -0.59
     bowel
    -0.58
    opsis
    -0.57
    pill
    -0.57
    POSITIVE LOGITS
    vernment
    1.08
    iants
    0.96
    glers
    0.96
    roups
    0.90
    ORGE
    0.89
    raphic
    0.87
    hetto
    0.83
    rets
    0.83
    irlfriend
    0.82
    stones
    0.82
    Act Density 0.128%

    No Known Activations