INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     rendered
    -0.07
     crises
    -0.07
     injuries
    -0.07
    BERT
    -0.07
    اذا
    -0.06
    ousing
    -0.06
     hizmet
    -0.06
     vous
    -0.06
    ści
    -0.06
    mentation
    -0.06
    POSITIVE LOGITS
     own
    0.21
     Own
    0.18
     OWN
    0.13
    Own
    0.12
    _own
    0.11
    own
    0.09
    OWN
    0.07
    	UN
    0.07
    	extern
    0.07
     }//
    0.07
    Act Density 0.025%

    No Known Activations