INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ю
    -0.06
     evening
    -0.06
     české
    -0.06
    .counter
    -0.06
    	Map
    -0.06
    -0.06
    choice
    -0.06
     Begin
    -0.06
    BI
    -0.06
     billionaire
    -0.06
    POSITIVE LOGITS
    dın
    0.07
    bart
    0.06
     subtotal
    0.06
     asn
    0.06
    ostí
    0.06
     templ
    0.06
    щают
    0.06
     distrust
    0.06
     sơn
    0.06
    gın
    0.06
    Act Density 0.030%

    No Known Activations